Detect Clinical Conditions (LangTest - ner_eu_clinical_condition)

Description

Pretrained named entity recognition (NER) deep learning model for clinical conditions. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state-of-the-art model for NER: Chiu & Nichols, Named Entity Recognition with Bidirectional LSTM-CNN. The model is the version of ner_eu_clinical_condition model augmented with langtest library.

test_type before fail_count after fail_count before pass_count after pass_count minimum pass_rate before pass_rate after pass_rate
add_abbreviation 25 31 348 486 80% 93% 94%
add_ocr_typo 61 66 360 501 80% 86% 88%
add_typo 41 41 383 528 80% 90% 93%
lowercase 6 5 435 583 80% 99% 99%
number_to_word 4 7 131 161 80% 97% 96%
strip_all_punctuation 22 23 421 565 80% 95% 96%
strip_punctuation 6 5 437 582 80% 99% 99%
swap_entities 60 43 138 225 80% 70% 84%
titlecase 106 93 337 493 80% 76% 84%
uppercase 193 104 250 484 80% 56% 82%
weighted average 524 418 3240 4608 80% 86.08% 91.68%

Predicted Entities

clinical_condition

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
	.setInputCol("text")\
	.setOutputCol("document")
 
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
	.setInputCols(["sentence"])\
	.setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
	.setInputCols(["sentence", "token"])\
	.setOutputCol("embeddings")

ner = MedicalNerModel.pretrained('ner_eu_clinical_condition_langtest', "en", "clinical/models") \
	.setInputCols(["sentence", "token", "embeddings"]) \
	.setOutputCol("ner")
 
ner_converter = NerConverterInternal()\
	.setInputCols(["sentence", "token", "ner"])\
	.setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[
	document_assembler,
	sentence_detector,
	tokenizer,
	word_embeddings,
	ner,
	ner_converter])

data = spark.createDataFrame([["""Hyperparathyroidism was considered upon the fourth occasion. The history of weakness and generalized joint pains were present. He also had history of epigastric pain diagnosed informally as gastritis. He had previously had open reduction and internal fixation for the initial two fractures under general anesthesia. He sustained mandibular fracture."""]]).toDF("text")

result = pipeline.fit(data).transform(data)
val documenter = new DocumentAssembler() 
    .setInputCol("text") 
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
	.setInputCols(Array("sentence", "token"))
	.setOutputCol("embeddings")

val ner_model = MedicalNerModel.pretrained("ner_eu_clinical_condition_langtest", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(documenter, sentence_detector, tokenizer, word_embeddings, ner_model, ner_converter))

val data = Seq(Array("""Hyperparathyroidism was considered upon the fourth occasion. The history of weakness and generalized joint pains were present. He also had history of epigastric pain diagnosed informally as gastritis. He had previously had open reduction and internal fixation for the initial two fractures under general anesthesia. He sustained mandibular fracture.""")).toDS().toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+-------------------+------------------+
|chunk              |ner_label         |
+-------------------+------------------+
|Hyperparathyroidism|clinical_condition|
|weakness           |clinical_condition|
|joint pains        |clinical_condition|
|epigastric pain    |clinical_condition|
|gastritis          |clinical_condition|
|fractures          |clinical_condition|
|anesthesia         |clinical_condition|
|mandibular fracture|clinical_condition|
+-------------------+------------------+

Model Information

Model Name: ner_eu_clinical_condition_langtest
Compatibility: Healthcare NLP 5.1.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.6 MB

References

The corpus used for model training is provided by European Clinical Case Corpus (E3C), a project aimed at offering a freely available multilingual corpus of semantically annotated clinical narratives.

Benchmarking

label               precision  recall  f1-score  support 
clinical_condition  0.95       0.95    0.95      432     
micro-avg           0.95       0.95    0.95      432     
macro-avg           0.95       0.95    0.95      432     
weighted-avg        0.95       0.95    0.95      432