The internet has become a central medium for members of the public to express their opinions and engage in debate. However, debate spaces on the web are often derailed by offensive comments and personal attacks. One attempt at dealing with this challenge is to automate content moderation by training machine learning classifiers on large corpora of texts manually annotated for offence.
This project explores practical and ethical aspects of such 'algorithmic moderation'. While these systems could help encourage more civil debate, they must navigate inherently normatively contestable boundaries, and are therefore subject to the idiosyncratic norms of the human raters who provide the training data. An important objective for platforms implementing such systems might be to ensure that they are not unduly biased towards or against particular norms of offence.
Our work explores ways to measure normative biases of algorithmic moderation systems, for example by comparing the output of classifiers trained on data sets annotated by people from different demographics. Our work also explores the ethical choices that, as a consequence, faces those who implement these automated systems.