Description
Identify Racism, Sexism or Neutral tweets.
Predicted Entities
neutral
, racism
, sexism
Live Demo Open in Colab Download
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_cyberbullying', 'en') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlpPipeline = Pipeline(stages=[document_assembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked')
Results
+--------------------------------------------------------------------------------------------------------+------------+
|document |class |
+--------------------------------------------------------------------------------------------------------+------------+
|@geeky_zekey Thanks for showing again that blacks are the biggest racists. Blocked. | racism |
+--------------------------------------------------------------------------------------------------------+------------+
Model Information
Model Name: | classifierdl_use_cyberbullying |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Dependencies: | tfhub_use |
Data Source
This model is trained on cyberbullying detection dataset. https://raw.githubusercontent.com/dhavalpotdar/cyberbullying-detection/master/data/data/data.csv
Benchmarking
precision recall f1-score support
neutral 0.72 0.76 0.74 700
racism 0.89 0.94 0.92 773
sexism 0.82 0.71 0.76 622
accuracy 0.81 2095
macro avg 0.81 0.80 0.80 2095
weighted avg 0.81 0.81 0.81 2095