Description
This model is trained with the Generic Classifier annotator and the Support Vector Machine (SVM) algorithm and classifies text/sentence into two categories:
True : The sentence is talking about a possible ADE
False : The sentence doesn’t have any information about an ADE.
The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).
Predicted Entities
True
, False
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
.setInputCols(["document","token"])\
.setOutputCol("word_embeddings")
sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "word_embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")
features_asm = FeaturesAssembler()\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("features")
generic_classifier = GenericClassifierModel.pretrained("generic_svm_classifier_ade", "en", "clinical/models")\
.setInputCols(["features"])\
.setOutputCol("class")
clf_Pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
word_embeddings,
sentence_embeddings,
features_asm,
generic_classifier])
data = spark.createDataFrame([["""None of the patients required treatment for the overdose."""], ["""I feel a bit drowsy & have a little blurred vision after taking an insulin"""]]).toDF("text")
result = clf_Pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val word_embeddings = new WordEmbeddingsModel().pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("document", "token"))
.setOutputCol("word_embeddings")
val sentence_embeddings = new SentenceEmbeddings()
.setInputCols(Array("document", "word_embeddings"))
.setOutputCol("sentence_embeddings")
.setPoolingStrategy("AVERAGE")
val features_asm = new FeaturesAssembler()
.setInputCols("sentence_embeddings")
.setOutputCol("features")
val generic_classifier = new GenericClassifierModel.pretrained("generic_svm_classifier_ade", "en", "clinical/models")
.setInputCols("features")
.setOutputCol("class")
val clf_Pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, word_embeddings, sentence_embeddings, features_asm, generic_classifier))
val data = Seq(Array("None of the patients required treatment for the overdose.", "I feel a bit drowsy & have a little blurred vision after taking an insulin")).toDS().toDF("text")
val result = clf_Pipeline.fit(data).transform(data)
Results
+--------------------------------------------------------------------------+-------+
|text |result |
+--------------------------------------------------------------------------+-------+
|None of the patients required treatment for the overdose. |[False]|
|I feel a bit drowsy & have a little blurred vision after taking an insulin|[True] |
+--------------------------------------------------------------------------+-------+
Model Information
Model Name: | generic_svm_classifier_ade |
Compatibility: | Healthcare NLP 4.4.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [feature_vector] |
Output Labels: | [prediction] |
Language: | en |
Size: | 16.4 KB |
References
The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).
Reference: Gurulingappa et al., Benchmark Corpus to Support Information Extraction for Adverse Drug Effects, JBI, 2012. https://www.sciencedirect.com/science/article/pii/S1532046412000615
Benchmarking
label precision recall f1-score support
False 0.84 0.92 0.88 3362
True 0.74 0.58 0.65 1361
accuracy - - 0.82 4723
macro avg 0.79 0.75 0.76 4723
weighted avg 0.81 0.82 0.81 4723