Description
This model is trained with the Logistic Regression algorithm and classifies text/sentence into two categories:
True : The sentence is talking about a possible ADE
False : The sentence doesn’t have any information about an ADE.
The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).
Predicted Entities
True
, False
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
logreg = DocumentLogRegClassifierModel.pretrained("classifier_logreg_ade", "en", "clinical/models")\
.setInputCols("token")\
.setOutputCol("prediction")
clf_Pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
logreg])
data = spark.createDataFrame([["""I feel great after taking tylenol."""], ["""Detection of activated eosinophils in nasal polyps of an aspirin-induced asthma patient."""]]).toDF("text")
result = clf_Pipeline.fit(data).transform(data)
val document_assembler =new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val logreg = new DocumentLogRegClassifierModel.pretrained("classifier_logreg_ade", "en", "clinical/models")
.setInputCols("token")
.setOutputCol("prediction")
val clf_Pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, logreg))
val data = Seq(Array("I feel great after taking tylenol", "Detection of activated eosinophils in nasal polyps of an aspirin-induced asthma patient.")).toDS().toDF("text")
val result = clf_Pipeline.fit(data).transform(data)
Results
+----------------------------------------------------------------------------------------+-------+
|text |result |
+----------------------------------------------------------------------------------------+-------+
|I feel great after taking tylenol |[False]|
|Detection of activated eosinophils in nasal polyps of an aspirin-induced asthma patient.|[True] |
+----------------------------------------------------------------------------------------+-------+
Model Information
Model Name: | classifier_logreg_ade |
Compatibility: | Healthcare NLP 4.4.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [token] |
Output Labels: | [prediction] |
Language: | en |
Size: | 596.4 KB |
References
The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).
Reference: Gurulingappa et al., Benchmark Corpus to Support Information Extraction for Adverse Drug Effects, JBI, 2012. https://www.sciencedirect.com/science/article/pii/S1532046412000615
Benchmarking
label precision recall f1-score support
False 0.91 0.92 0.92 3362
True 0.79 0.79 0.79 1361
accuracy - - 0.88 4723
macro_avg 0.85 0.85 0.85 4723
weighted_avg 0.88 0.88 0.88 4723