Description
Detect adverse reactions of drugs in texts excahnged over twitter. This model is trained with the BertForTokenClassification
method from the transformers library and imported into Spark NLP.
Predicted Entities
ADE
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained()\
.setInputCols("document")\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade_binary", "en", "clinical/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)
ner_converter = NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier,
ner_converter])
model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
data = spark.createDataFrame(["I used to be on paxil but that made me more depressed and prozac made me angry",
"Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs."], StringType()).toDF("text")
result = model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade_binary", "en", "clinical/models")
.setInputCols(Array("token", "sentence"))
.setOutputCol("label")
.setCaseSensitive(True)
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "label"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
tokenClassifier,
ner_converter))
val data = Seq(Array("I used to be on paxil but that made me more depressed and prozac made me angry",
"Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs.")).toDS().toDF("text")
val result = model.fit(data).transform(data)
import nlu
nlu.load("en.classify.bert_token.ner_ade_bert").predict("""Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs.""")
Results
+-------------+---------+
|chunk |ner_label|
+-------------+---------+
|depressed |ADE |
|angry |ADE |
|sugar crashes|ADE |
+-------------+---------+
Model Information
Model Name: | bert_token_classifier_ner_ade_binary |
Compatibility: | Healthcare NLP 4.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | en |
Size: | 404.2 MB |
Case sensitive: | true |
Max sentence length: | 128 |
Benchmarking
label precision recall f1-score support
B-ADE 0.89 0.88 0.88 3720
I-ADE 0.85 0.84 0.84 3145
O 0.98 0.98 0.98 26963
accuracy - - 0.95 33828
macro-avg 0.90 0.90 0.90 33828
weighted-avg 0.95 0.95 0.95 33828