Dataset for abusive language — English specializing for racism and sexism (from University of Copenhagen)
Dataset for abusive language in German — from University of Duisburg-Essen
German Sentiment Lexicon — from University of Zurich
SentiWS — a Publicly Available German-language Resource for Sentiment Analysis (from University of Leipzig)
GermanPolarityClues — A Lexical Resource for German Sentiment Analysis (from University of Bielefeld)
GermaNet — a highly sophisticated semantic ontology of German (from University of Tübingen)
Link to API — for GermaNet
JWKTL — a Java-based Wiktionary library (from Technische Universität Darmstadt)
Word Embeddings trained on German tweets — from SpinningBytes
German corpora generated from the Web — deWaC and sdewac
Textblob-de — The German language extension for TextBlob, a Python (2 and 3) library for processing textual data
Spacy — Python modules for processing English and German language
Treetagger — a part-of-speech tagger for German (included lemmatization) from LMU
MarMoT — a fast and accurate morphological tagger for German (from LMU)
Mate tools — lemmatizer, POS-tagger, morphology, dependency parser for German
For running German tools, you need anna-3.61.jar and ger-tagger+lemmatizer+morphology+graph-based-3.6+.tgz
Description of processing pipeline and corresponding formats
Morphisto — a tool for morphological analysis for German (from Institute for German Language, Mannheim)
Keras — a high level API for neural networks in python
SVMlight — an implementation of Support Vector Machines
FastText — a library for fast text representation and text classification (from Facebook research)
Word2vec — a tool for inducing word embeddings
vecMap — a tool for inducing cross-lingual word embeddings
Brown clustering tool — from Stanford University
Website — listing useful tools for processing Twitter
Cynthia Van Hee, Els Lefever, Ben Verhoeven, Julie Mennes, Bart Desmet, Guy De Pauw, Walter Daelemans, and Veronique Hoste_ “Detection and fine-grained classification of cyberbullying events”, In Proceedings of Recent Advances in Natural Language
Processing (RANLP), 2015.
Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang: “Abusive language detection in online user content”, In Proceedings of the International Conference on World Wide Web”, 2016.
Amir H. Razavi, Diana Inkpen, Sasha Uritsky, and Stan Matwin: “Offensive language detection using multi-level classification”, In Proceedings of the Canadian Conference on Advances in Artificial Intelligence, 2010.
Björn Ross, Michael Rist, Guillermo Carbonell, Ben Cabrera, Nils Kurowsky, Michael Wojatzki: “Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis”, In Proceedings of the KONVENS-Workshop on Natural Language Processing for Computer-Mediated Communication (KONVENS-NLP4CMC), 2016.
Anna Schmidt, Michael Wiegand: “A Survey on Hate Speech Detection using Natural Language Processing”, in Proceedings of EACL-Workshop on Natural Language Processing for Social Media (EACL-SocialNLP), 2017.
Ellen Spertus: “Smokey: Automatic recognition of hostile messages”, In Proceedings of the National Conference on Artificial Intelligence and the Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), 1997.
William Warner and Julia Hirschberg: “Detecting hate speech on the world wide web”, In Proceedings of the NAACL-Workshop on Language in Social Media, (NAACL-LSM), 2012.
Zeerak Waseem and Dirk Hovy: “Hateful symbols or hateful people? predictive features for hate
speech detection on twitter”, In Proceedings of the NAACL Student Research Workshop, 2016.
A large bibliography related to the detection of abusive language maintained by Zeerak Waseem (University Sheffield).