Cyberbullying Classifier

Description

Identify Racism, Sexism or Neutral tweets.

Predicted Entities

neutral, racism, sexism.

Live Demo Open in Colab Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")

nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked')

val documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_cyberbullying", "en")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))

val data = Seq("@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked").toDF("text")
val result = pipeline.fit(data).transform(data)

import nlu

text = ["""@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked"""]
cyberbull_df = nlu.load('classify.cyberbullying.use').predict(text, output_level='document')
cyberbull_df[["document", "cyberbullying"]]

Results

+--------------------------------------------------------------------------------------------------------+------------+
|document                                                                                                |class       |
+--------------------------------------------------------------------------------------------------------+------------+
|@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked.                     | racism     |
+--------------------------------------------------------------------------------------------------------+------------+

Model Information

Model Name	classifierdl_use_cyberbullying
Model Class	ClassifierDLModel
Spark Compatibility	2.5.3
Spark NLP Compatibility	2.4
License	open source
Edition	public
Input Labels	[document, sentence_embeddings]
Output Labels	[class]
Language	en
Upstream Dependencies	tfhub_use

Data Source

This model is trained on cyberbullying detection dataset. https://raw.githubusercontent.com/dhavalpotdar/cyberbullying-detection/master/data/data/data.csv

Benchmarking

precision    recall  f1-score   support

none       0.69      1.00      0.81      3245
racism       0.00      0.00      0.00       568
sexism       0.00      0.00      0.00       922

accuracy                           0.69      4735
macro avg       0.23      0.33      0.27      4735
weighted avg       0.47      0.69      0.56      4735

PREVIOUSDetect Persons, Locations, Organizations and Misc Entities in Portuguese (WikiNER 840B 300)

NEXTEmotion Detection Classifier