The evaluation tool comes as a self-contained perl script. It requires two files: the file with a system prediction and some gold standard file. Both files have to comply with the tab-separated format described in the format section. To get more information about its usage, simply type:
perl evaluationScriptGermeval2018.pl --help
On the task to be evaluated, the script computes for each class precision, recall and f(1)-score. As a summarizing score, the tool computes accuracy and macro-average precision, recall and f(1)-score.
Although the evaluation tool outputs several evaluation measures, the official ranking of the systems will be based on the macro-average f-score only. Please remember this when tuning your classifiers. A classifier that is optimized for accuracy may not necessarily produce optimal results in terms of macro-average f-score!
Since the tool does not hard-encode the gold standard file but instead requires it as an optional input parameter, the script can be used for your own gold standard file. The script also makes some coarse format checking of both prediction file and gold standard file. It will immediately break, once some violation in format has been detected.
You find the evaluation script in the data respository of the shared task: