With the rise of Massive Open Online Courses (MOOCs), online peer marking is an attractive contemporary tool for educational assessment. However its widespread use faces serious challenges, most significantly in the perceived and actual reliability of assessment grades, which can be affected by the ability of peers to mark accurately and the potential for collusion and bias. There exist a number of aggregation approaches for alleviating the impact of biased scores, usually involving either the down-weighting or removal of outliers. Here we investigate the use of the least trimmed squares (LTS) and Huber mean for the aggregation step, comparing their performance to weighting of markers based on divergence from other peers' marks. We design an experimental setup to generate scores and test a number of conditions. Overall we find that for a feasible number of peer markers, when the student pool comprises a significant number of `biased' markers, outlier removal techniques are likely to result in a number of very unfair assessments, while more standard approaches will have more grades unfairly influenced but to a lesser extent.