Image: Network

Call for Participation

Call for Participation

Dear All,

this is the call to participate in the Shared Task on Identification of Offensive Language GermEval 2019 (Task 2). We invite everyone from academia and industry to participate in the Shared Task on the Identification of Offensive Language for German.

Introduction

Offensive language is commonly defined as hurtful, derogatory or obscene comments made by one person to another. This type of language can increasingly be found on the web. As a consequence many operators of social media websites no longer manage to manually monitor user posts. Therefore, there is a pressing demand for methods to automatically identify suspicious posts.

This second shared task on the topic is to intensify research on the identification of offensive content in German language microposts. Offensive comments are to be detected from a set of German tweets. We focus on Twitter since tweets can be regarded as a prototypical type of micropost.

The workshop discussing this year’s edition of this shared task is planned to be held in conjunction with the Conference on Natural Language Processing (KONVENS ) in Erlangen in October 2019.

Data

The training and test data from 2018 serve as example data for subtask I and subtask II and are available from the website (URL see above). An evaluation script can be downloaded there as well.

The training data, which are going to be released in April, can be downloaded after registering with the organizing committee. The task evaluations will take place in July 2019. 

Timeline

  • April 1, 2019: Registration opens
  • April 25, 2019: Release of training data
  • July 1, 2019: Registration deadline
  • July 15, 2019: Release of test data
  • August 4, 2019: Submission of system runs
  • August 18, 2019: Submission of system description paper and survey
  • September 1, 2019: Feedback on system description papers
  • September 8, 2019: Final submission of system description papers
  • October 8, 2019: Workshop co-located with KONVENS-2019

Tasks

We offer the two subtasks described below, as in 2018. Additionally, we will have a third subtask on explicit and implicit offensive language in 2019 that is also described below.

Participants in this year’s shared task can choose to participate in one, two or all of the subtasks.

Subtask I — Binary classification

The task is to decide whether a tweet includes some form of offensive language or not.

Subtask II — Fine-grained classification

In addition to detecting offensive language tweets, we distinguish between three subcategories:

PROFANITY: usage of profane words, however, the tweet clearly does not want to insult anyone.

INSULT: unlike PROFANITY the tweet clearly wants to offend someone.

ABUSE: unlike INSULT, the tweet does not just insult a person but represents the stronger form of abusive language

Subtask III – Classification of explicit and implicit offensive language

In addition to detecting offensive language tweets, we distinguish between two subcategories:

EXPLICIT: an offensive tweet which directly expresses hate, condemnation, superiority towards an explicitly or implicitly given target

IMPLICIT:  an offensive tweet where the expression of hate, condemnation, superiority etc. as directed towards an explicitly or implicitly given target has to be inferred from the ascription of (hypothesized) target properties that are insulting, degrading, offending, humiliating etc.

Subtask III is cast as a two-way classification task where a tweet either is explicit offensive (EXPLICIT) or implicit offensive (IMPLICIT).

Data and Tasks

Data

You find all manually labeled data from the shared task 2018, i.e. both training and test files, in the following GIT-repository:
https://github.com/uds-lsv/GermEval-2018-Data
That repository also includes the annotation guidelines of the shared task.
Data from 2019 is available at:

https://fz.h-da.de/iggsa/data

Tasks

We have 3 subtasks described below. Participants to this years’ shared task either choose to participate in one, two or all of the subtasks.

Subtask I — Binary classification

The task is to decide whether a tweet includes a) some form of offensive language or b) or not.

  1. OFFENSE:
    • Juhu, das morgige Wetter passt zum Tag SCHEIßWETTER
    • @KarlLagerfeld ist in meinen Augen strunzdumm wie ein Knäckebrot.
    • Für mich ist die #katholische #Kirche nur ein Sammelbecken von #Verbrechern #Kinderschändern und #Kriminellen #Lokalzeit
  2. OTHER:
    • @Sakoelabo @Padit1337 @SawsanChebli Nicht alle Staatssekretäre kann man ernst nehmen.
    • Endlich hat Kurz einen Verbündeten aus Frankreich, der auch die |LBR| ungesetzliche Einwanderung von jungen Afrikanern unterbinden will.
    • Die Türkei führt einen Angriffskrieg und die @spdde inkl. @sigmargabriel rüstet noch ihre Panzer auf.

Subtask II — Fine-grained classification
In addition to detecting abusive language tweets, we distinguish between 3 subcategories:

  1. PROFANITY: usage of profane words, however, the tweet clearly does not want to insult anyone.
    • Juhu, das morgige Wetter passt zum Tag SCHEIßWETTER
    • @TiffanyAngelx zu anbeissen ,dein geiler Arsch
    • Als SPD wäre ich jetzt maximal angepisst.
  2. INSULT: unlike PROFANITY the tweet clearly wants to offend someone.
    • ein #Tatort mit der Presswurst #Saalfeld geht gar nicht #ARD
    • @KarlLagerfeld ist in meinen Augen strunzdumm wie ein Knäckebrot.
    • Wo ist #Kubicki heute? Ist er schon besoffen im Puff?
  3. ABUSE: unlike INSULT, the tweet does not just insult a person but represents the stronger form of abusive language. (For a concise definition, please consult the above annotation guidelines).
    • was mich stört ist wenn am frühen Morgen im #Morgenmagazin schon strunzdumme #Migranten moderieren
    • ich würde auch nicht mit einer schwarzen #Schwuchtel zusammenarbeiten #Tatort
    • Ich persönlich scheisse auf die grüne Kinderfickerpartei
    • OTHER: same as in Subtask I

Subtask III — Classification of explicit and implicit offensive language
In addition to detecting offensive language tweets, we distinguish between 2 subcategories:

  1. EXPLICIT: an offensive tweet which directly expresses hate, condemnation, superiority towards an explicitly or implicitly given target.
    • SPD verseuchtes Nazipack heisst jetzt ANTIFA
  2. IMPLICIT: an offensive tweet where the expression of hate, condemnation, superiority etc. as directed towards an explicitly or implicitly given target has to be inferred from the ascription of (hypothesized) target properties that are insulting, degrading, offending, humiliating etc.
    • Dem Kommentar entnehme ich das auch ihre Schaukel als Kind zu nahe an der Wand gestanden hat.
    • Flüchtlinge fliehen nach Deutschland parallel dazu lassen sie ihre Familien in der Heimat sterben sehr ehrenhaft ....
    • Subtask III is cast as a two-way classification task where a tweet either is explicit offensive (EXPLICIT) or implicit offensive (IMPLICIT).

Format

All labeled data files (i.e. trial data, training data) come as a simple tab-separated file encoded in utf-8.

Each line represents one tweet.

The tweets are in random order.

The format for Subtask I & II is:

<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II>
Example:

Juhu, das morgige Wetter passt zum Tag SCHEIßWETTER        OFFENSE        PROFANITY

 

The format for Subtask III is:

<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II> tab <LABEL-TASK-III>

 

Example:

Ich persönlich scheisse auf die grüne Kinderfickerpartei        OFFENSE        ABUSE        EXPLICIT

 

The test-file that will be released for the shared task evaluation window will just include the tweet text in each line.
Note: The testdata of Subtask III will be prefilled with the results of Subtask I and II, to guarantee an independant rating. And for those who only wanted to participate in Subtask III.

The file with the gold annotation will be released shortly after the evaluation window on this website. The format will be in the above tab-separated format.

 

The submission file  should have the same format as that of the labeled data files (i.e. trial data, training data). If participants only wish to participate in one task, the column with the task they do not wish to participate in has to be filled with an arbitrary dummy string.
This concerns especially Subtask I and II, because the evaluation tool requires all three columns (<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II>) to be filled.
Participants of the Subtask III, please note that the training data and test data for Subtask III are not the same files as the files of Subtask I and II. Therefore all four columns (<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II> tab <LABEL-TASK-III>) of the file have to be filled.

Evaluation Tool

The following zip-file contains this years‘ evaluation tool.

evaluationScriptGermeval2019

The evaluation tool comes as a self-contained perl script. It requires two files: the file with a system prediction and some gold standard file. Both files have to comply with the tab-separated format described in the format section. To get more information about its usage, simply type:

 

perl evaluationScriptGermeval2019.pl --help

 

On the task to be evaluated, the script computes for each class precision, recall and f(1)-score. As a summarizing score, the tool computes accuracy and macro-average precision, recall and f(1)-score.

 Although the evaluation tool outputs several evaluation measures, the official ranking of the systems will be based on the macro-average f-score only. Please remember this when tuning your classifiers. A classifier that is optimized for accuracy may not necessarily produce optimal results in terms of macro-average f-score!

Since the tool does not hard-encode the gold standard file but instead requires it as an optional input parameter, the script can be used for your own gold standard file. The script also makes some coarse format checking of both prediction file and gold standard file. It will immediately break, once some violation in format has been detected.

The same usage applies to the evaluation script of the Shared Task 2018.
You will find this evaluation script in the data respository of the shared task:
https://github.com/uds-lsv/GermEval-2018-Data

Survey

One mandatory requirement of participating in the shared task is to complete a survey.
The survey not only asks a few questions on your system design but it also gives you the opportunity to provide some feedback on the organization of this year’s shared task.

Each group has to complete this survey by the submission deadline of the system output.
The URL of the online survey will be provided via email.

Each group is to fill out exactly one survey. The reply to the survey can only be submitted once.

The results of this survey will be presented on the workshop of the shared task in Erlangen.

A summary of the results of the survey will also be made available.

The survey form can be found here:
https://www.surveymonkey.de/r/SFKGVLT

The results of this year’s survey can be found here:
Survey_Data_All_190827

GermEval

GermEval is a series of shared task evaluation campaigns that focus on natural language processing for the German language. So far, there have been four iterations of GermEval, each with a different type of task. GermEval shared tasks have been run informally by self-organized groups of interested researchers. However, the last shared task as well as the one for 2019 were endorsed by special interest groups within the German Society for Computational Linguistics (GSCL). All iterations of GermEval shared tasks held their concluding workshop in conjunction with either the GSCL or the KONVENS bi-annual conferences, depending on which of them took place.

For the first time in 2019, there is more than one shared task in GermEval:

Mailing Group

Please join our discussion group at iggsa2019partners@googlegroups.com in order to receive announcements and participate in discussions.

Best regards,

The GermEval 2019 Task 2 Organizers: