All labeled data files (i.e. trial data, training data) come as a simple tab-separated file encoded in utf-8.
Each line represents one tweet.
The tweets are in random order.
The format for Subtask I & II is:
<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II>
Juhu, das morgige Wetter passt zum Tag SCHEIßWETTER OFFENSE PROFANITY
The format for Subtask III is:
<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II> tab <LABEL-TASK-III>
Ich persönlich scheisse auf die grüne Kinderfickerpartei OFFENSE ABUSE EXPLICIT
The test-file that will be released for the shared task evaluation window will just include the tweet text in each line.
Note: The testdata of Subtask III will be prefilled with the results of Subtask I and II, to guarantee an independant rating. And for those who only wanted to participate in Subtask III.
The file with the gold annotation will be released shortly after the evaluation window on this website. The format will be in the above tab-separated format.
The submission file should have the same format as that of the labeled data files (i.e. trial data, training data). If participants only wish to participate in one task, the column with the task they do not wish to participate in has to be filled with an arbitrary dummy string.
This concerns especially Subtask I and II, because the evaluation tool requires all three columns (<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II>) to be filled.
Participants of the Subtask III, please note that the training data and test data for Subtask III are not the same files as the files of Subtask I and II. Therefore all four columns (<TWEET> tab <LABEL-TASK-I> tab <LABEL-TASK-II> tab <LABEL-TASK-III>) of the file have to be filled.