Description
This model is trained to extract benefits of using drugs for certain conditions.
Predicted Entities
CONDITION
, BENEFIT
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\
embeddings = WordEmbeddingsModel.pretrained('embeddings_clinical', 'en', 'clinical/models') \
.setInputCols(['sentence', 'token']) \
.setOutputCol('embeddings')
ner = MedicalNerModel.pretrained('ner_supplement_clinical', 'en', 'clinical/models') \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner_tags")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner_tags"]) \
.setOutputCol("ner_chunk")\
ner_pipeline = Pipeline(
stages = [
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner,
ner_converter
])
sample_df = spark.createDataFrame([["Excellent!. The state of health improves, nervousness disappears, and night sleep improves. It also promotes hair and nail growth. I recommend :)"]]).toDF("text")
result = ner_pipeline.fit(sample_df).transform(sample_df)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_supplement_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner_tags")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner_tags"))
.setOutputCol("ner_chunk")
val ner_pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, ner, ner_converter))
val sample_df = Seq("""Excellent!. The state of health improves, nervousness disappears, and night sleep improves. It also promotes hair and nail growth. I recommend :)""").toDS.toDF("text")
val result = ner_pipeline.fit(sample_df).transform(sample_df)
import nlu
nlu.load("en.med_ner.supplement_clinical").predict("""Excellent!. The state of health improves, nervousness disappears, and night sleep improves. It also promotes hair and nail growth. I recommend :)""")
Results
+------------------------+---------------+
| chunk | ner_label |
+------------------------+---------------+
| nervousness | CONDITION |
| night sleep improves | BENEFIT |
| hair | BENEFIT |
| nail | BENEFIT |
+------------------------+---------------+
Model Information
Model Name: | ner_supplement_clinical |
Compatibility: | Healthcare NLP 3.3.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.5 MB |
References
Trained on healthsea dataset: https://github.com/explosion/healthsea/tree/main/project/assets/ner
Benchmarking
label tp fp fn prec rec f1
B-BENEFIT 268 39 42 0.87296414 0.86451614 0.86871964
I-CONDITION 178 29 72 0.8599034 0.712 0.7789934
I-BENEFIT 52 14 32 0.7878788 0.61904764 0.6933334
B-CONDITION 365 78 61 0.82392776 0.85680753 0.840046
Macro-average 863 160 207 0.8361685 0.7630928 0.7979612
Micro-average 863 160 207 0.84359723 0.80654204 0.8246535