Description
This model extracts social environment terminologies related to Social Determinants of Health from various kinds of documents.
Predicted Entities
Social_Support
, Chidhood_Event
, Social_Exclusion
, Violence_Abuse_Legal
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_sdoh_social_environment_wip", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
])
sample_texts = ["He is the primary caregiver.",
"There is some evidence of abuse.",
"She stated that she was in a safe environment in prison, but that her siblings lived in an unsafe neighborhood, she was very afraid for them and witnessed their ostracism by other people.",
"Medical history: Jane was born in a low - income household and experienced significant trauma during her childhood, including physical and emotional abuse."]
data = spark.createDataFrame(sample_texts, StringType()).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_sdoh_social_environment_wip", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
))
val data = Seq("Medical history: Jane was born in a low - income household and experienced significant trauma during her childhood, including physical and emotional abuse.").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+--------------------+-----+---+---------------------------+
|ner_label |begin|end|chunk |
+--------------------+-----+---+---------------------------+
|Social_Support |10 |26 |primary caregiver |
|Violence_Abuse_Legal|26 |30 |abuse |
|Violence_Abuse_Legal|49 |54 |prison |
|Social_Exclusion |161 |169|ostracism |
|Chidhood_Event |87 |113|trauma during her childhood|
|Violence_Abuse_Legal|139 |153|emotional abuse |
+--------------------+-----+---+---------------------------+
Model Information
Model Name: | ner_sdoh_social_environment_wip |
Compatibility: | Healthcare NLP 4.2.8+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 858.7 KB |
Benchmarking
label tp fp fn total precision recall f1
Chidhood_Event 34.0 6.0 5.0 39.0 0.850000 0.871795 0.860759
Social_Exclusion 45.0 6.0 12.0 57.0 0.882353 0.789474 0.833333
Social_Support 1139.0 57.0 103.0 1242.0 0.952341 0.917069 0.934372
Violence_Abuse_Legal 235.0 38.0 44.0 279.0 0.860806 0.842294 0.851449