Resources, Tools and Literature


Dataset for abusive language — English specializing for racism and sexism (from University of Copenhagen)

Dataset for abusive language in German — from University of Duisburg-Essen

German slur-dictionary. Also this.

German Sentiment Lexicon — from University of Zurich

SentiWS — a Publicly Available German-language Resource for Sentiment Analysis (from University of Leipzig)

GermanPolarityClues — A Lexical Resource for German Sentiment Analysis (from University of Bielefeld)

GermaNet — a highly sophisticated semantic ontology of German (from University of Tübingen)

Link to API — for GermaNet

JWKTL — a Java-based Wiktionary library (from Technische Universität Darmstadt)

Word Embeddings trained on German tweets — from SpinningBytes

Word Embeddings trained on German Wikipedia

German corpora generated from the Web — deWaC and sdewac



Textblob-de  The German language extension for TextBlob, a Python (2 and 3) library for processing textual data

Spacy Python modules for processing English and German language

Treetagger — a part-of-speech tagger for German (included lemmatization) from LMU

MarMoT — a fast and accurate morphological tagger for German (from LMU)

Mate tools —  lemmatizer, POS-tagger, morphology, dependency parser for German

For running German tools, you need anna-3.61.jar and ger-tagger+lemmatizer+morphology+graph-based-3.6+.tgz

Description of processing pipeline and corresponding formats

Morphisto — a tool for morphological analysis for German (from Institute for German Language, Mannheim)

Keras — a high level API for neural networks in python

SVMlight — an implementation of Support Vector Machines

FastText — a library for fast text representation and text classification (from Facebook research)

Word2vec — a tool for inducing word embeddings

vecMap — a tool for inducing cross-lingual word embeddings

Brown clustering tool — from Stanford University

Website — listing useful tools for processing Twitter



Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste_ “Detection and fine-grained classification of cyberbullying events”, In Proceedings of Recent Advances in Natural Language
Processing (RANLP), 2015.



Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang: “Abusive language detection in online user content”, In Proceedings of the International Conference on World Wide Web”, 2016.



Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin: “Offensive language detection using multi-level classification”, In Proceedings of the Canadian Conference on Advances in Artificial Intelligence, 2010.



Björn Ross, Michael Rist, Guillermo Carbonell, Ben Cabrera, Nils Kurowsky, Michael Wojatzki: “Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis”, In Proceedings of the KONVENS-Workshop on Natural Language Processing for Computer-Mediated Communication (KONVENS-NLP4CMC), 2016.



Anna Schmidt, Michael Wiegand: “A Survey on Hate Speech Detection using Natural Language Processing”, in Proceedings of EACL-Workshop on Natural Language Processing for Social Media (EACL-SocialNLP), 2017.



Ellen Spertus: “Smokey: Automatic recognition of hostile messages”, In Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), 1997.



William Warner and Julia Hirschberg: “Detecting hate speech on the world wide web”, In Proceedings of the NAACL-Workshop on Language in Social Media, (NAACL-LSM), 2012.



Zeerak Waseem and Dirk Hovy: “Hateful symbols or hateful people? predictive features for hate
speech detection on twitter”, In Proceedings of the NAACL Student Research Workshop, 2016.



A large bibliography related to the detection of abusive language maintained by Zeerak Waseem (University Sheffield).


Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /var/www/html/iggsa/wp-includes/functions.php on line 5275

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /var/www/html/iggsa/wp-includes/functions.php on line 5275