Cyberbullying Classifier

Description

Identify Racism, Sexism or Neutral tweets.

Classified Labels

neutral, racism, sexism.

Live Demo Open in Colab Download

How to use

documentAssembler = DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
  .setInputCols(["document"])\
  .setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \
  .setInputCols(["document", "sentence_embeddings"]) \
  .setOutputCol("class")

nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked')

val documentAssembler = DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
  .setInputCols(Array("document"))
  .setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_cyberbullying", "en")
  .setInputCols(Array("document", "sentence_embeddings"))
  .setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))

val result = pipeline.fit(Seq.empty["@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked"].toDS.toDF("text")).transform(data)
import nlu

text = ["""@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked"""]
cyberbull_df = nlu.load('classify.cyberbullying.use').predict(text, output_level='document')
cyberbull_df[["document", "cyberbullying"]]

Results

+--------------------------------------------------------------------------------------------------------+------------+
|document                                                                                                |class       |
+--------------------------------------------------------------------------------------------------------+------------+
|@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked.                     | racism     |
+--------------------------------------------------------------------------------------------------------+------------+

Model Information

Model Name classifierdl_use_cyberbullying
Model Class ClassifierDLModel
Spark Compatibility 2.5.3
Spark NLP Compatibility 2.4
License open source
Edition public
Input Labels [document, sentence_embeddings]
Output Labels [class]
Language en
Upstream Dependencies tfhub_use

Data Source

This model is trained on cyberbullying detection dataset. https://raw.githubusercontent.com/dhavalpotdar/cyberbullying-detection/master/data/data/data.csv

Benchmarking

              precision    recall  f1-score   support

        none       0.69      1.00      0.81      3245
      racism       0.00      0.00      0.00       568
      sexism       0.00      0.00      0.00       922

    accuracy                           0.69      4735
   macro avg       0.23      0.33      0.27      4735
weighted avg       0.47      0.69      0.56      4735