Resources, Tools and Literature

Resources

Dataset for abusive language — English specializing for racism and sexism (from University of Copenhagen)

Dataset for abusive language in German — from University of Duisburg-Essen

German slur-dictionary. Also this.

German Sentiment Lexicon — from University of Zurich

SentiWS — a Publicly Available German-language Resource for Sentiment Analysis (from University of Leipzig)

GermanPolarityClues — A Lexical Resource for German Sentiment Analysis (from University of Bielefeld)

GermaNet — a highly sophisticated semantic ontology of German (from University of Tübingen)

Link to API — for GermaNet

JWKTL — a Java-based Wiktionary library (from Technische Universität Darmstadt)

Word Embeddings trained on German tweets — from SpinningBytes

Word Embeddings trained on German Wikipedia

German corpora generated from the Web — deWaC and sdewac

 

Tools

Textblob-de  The German language extension for TextBlob, a Python (2 and 3) library for processing textual data

Spacy Python modules for processing English and German language

Treetagger — a part-of-speech tagger for German (included lemmatization) from LMU

MarMoT — a fast and accurate morphological tagger for German (from LMU)

Mate tools —  lemmatizer, POS-tagger, morphology, dependency parser for German

For running German tools, you need anna-3.61.jar and ger-tagger+lemmatizer+morphology+graph-based-3.6+.tgz

Description of processing pipeline and corresponding formats

Morphisto — a tool for morphological analysis for German (from Institute for German Language, Mannheim)

Keras — a high level API for neural networks in python

SVMlight — an implementation of Support Vector Machines

FastText — a library for fast text representation and text classification (from Facebook research)

Word2vec — a tool for inducing word embeddings

vecMap — a tool for inducing cross-lingual word embeddings

Brown clustering tool — from Stanford University

Website — listing useful tools for processing Twitter

 

Literature

Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste_ “Detection and fine-grained classification of cyberbullying events”, In Proceedings of Recent Advances in Natural Language
Processing (RANLP), 2015.

URL: http://www.aclweb.org/anthology/R15-1086

 

Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang: “Abusive language detection in online user content”, In Proceedings of the International Conference on World Wide Web”, 2016.

URL: http://www.yichang-cs.com/yahoo/WWW16_Abusivedetection.pdf

 

Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin: “Offensive language detection using multi-level classification”, In Proceedings of the Canadian Conference on Advances in Artificial Intelligence, 2010.

URL: https://link.springer.com/content/pdf/10.1007%2F978-3-642-13059-5_5.pdf

 

Björn Ross, Michael Rist, Guillermo Carbonell, Ben Cabrera, Nils Kurowsky, Michael Wojatzki: “Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis”, In Proceedings of the KONVENS-Workshop on Natural Language Processing for Computer-Mediated Communication (KONVENS-NLP4CMC), 2016.

URL: https://www.linguistics.rub.de/bla/nlp4cmc2016/ross.pdf

 

Anna Schmidt, Michael Wiegand: “A Survey on Hate Speech Detection using Natural Language Processing”, in Proceedings of EACL-Workshop on Natural Language Processing for Social Media (EACL-SocialNLP), 2017.

URL: https://aclanthology.info/papers/W17-1101/w17-1101

 

Ellen Spertus: “Smokey: Automatic recognition of hostile messages”, In Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), 1997.

URL: https://www.aaai.org/Papers/IAAI/1997/IAAI97-209.pdf

 

William Warner and Julia Hirschberg: “Detecting hate speech on the world wide web”, In Proceedings of the NAACL-Workshop on Language in Social Media, (NAACL-LSM), 2012.

URL: http://www.aclweb.org/anthology/W12-2103

 

Zeerak Waseem and Dirk Hovy: “Hateful symbols or hateful people? predictive features for hate
speech detection on twitter”, In Proceedings of the NAACL Student Research Workshop, 2016.

URL: http://www.aclweb.org/anthology/N16-2013

 

A large bibliography related to the detection of abusive language maintained by Zeerak Waseem (University Sheffield).

URL: https://drive.google.com/file/d/0B4xDAGbwZJjQRS1Pa2VYOHdnRjA/view?usp=sharing