Offensive Language is commonly defined as hurtful, derogatory or obscene comments made by one person to another person. Such type of language can be more increasingly found on the web. As a consequence many operators of social media websites no longer manage to manually monitor user posts. Therefore, there is a pressing demand for methods to automatically identify suspicious posts.
This pilot shared task is to initiate and foster research on the identification of offensive content in German language microposts. Offensive comments are to be detected from a set of German tweets. We focus on Twitter since they can be regarded as a prototypical type of micropost.
The workshop discussing this year’s edition of this shared task will be held in conjunction with the Conference on Natural Language Processing KONVENS
The focus of this evaluation campaign lies on the linguistic analysis of offensive content that can be found on the web. We therefore only provide textual data and consider this task as a text classification problem. We are aware of the fact that such content can also be conveyed via other modes, such as images of videos. However, for the sake of keeping the complexity of the task at an acceptable level, we refrained from including them in our task.
The organizers exclusively have a linguistic interest in this subject matter. In no way is it their intention to promote a specific political or cultural view. Therefore, offensive language will be marked as such irrespective of its origin.
The examples listed on this website and also the actual data we provide in this shared task include very explicit language. These contents do not reflect the views of the organizers. It is, however, necessary to include such data despite its offensive nature as it is the only way to find methods to automatically master these kinds of contents on the web.