Cyberbullying Classifier - Spark NLP 2.7.1+

Description

Identify Racism, Sexism or Neutral tweets.

Predicted Entities

neutral, racism, sexism

Live Demo Open in Colab Download

How to use

document_assembler = DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")

use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
  .setInputCols(["document"])\
  .setOutputCol("sentence_embeddings")

document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \
  .setInputCols(["document", "sentence_embeddings"]) \
  .setOutputCol("class")

nlpPipeline = Pipeline(stages=[document_assembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked')

Results

+--------------------------------------------------------------------------------------------------------+------------+
|document                                                                                                |class       |
+--------------------------------------------------------------------------------------------------------+------------+
|@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked.                     | racism     |
+--------------------------------------------------------------------------------------------------------+------------+

Model Information

Model Name: classifierdl_use_cyberbullying
Compatibility: Spark NLP 2.7.1+
License: Open Source
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Dependencies: tfhub_use

Data Source

This model is trained on cyberbullying detection dataset. https://raw.githubusercontent.com/dhavalpotdar/cyberbullying-detection/master/data/data/data.csv

Benchmarking

              precision    recall  f1-score   support

     neutral       0.72      0.76      0.74       700
      racism       0.89      0.94      0.92       773
      sexism       0.82      0.71      0.76       622

    accuracy                           0.81      2095
   macro avg       0.81      0.80      0.80      2095
weighted avg       0.81      0.81      0.81      2095