Detect Diseases

Description

Pretrained named entity recognition deep learning model for diseases. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.

Predicted Entities

Disease.

Live Demo Open in Colab Download

How to use

...
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
  .setInputCols(["sentence", "token"])\
  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_diseases", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame(pd.DataFrame({"text": ["""Detection of various other intracellular signaling proteins is also described. Genetic characterization of transactivation of the human T-cell leukemia virus type 1 promoter: Binding of Tax to Tax-responsive element 1 is mediated by the cyclic AMP-responsive members of the CREB/ATF family of transcription factors. To achieve a better understanding of the mechanism of transactivation by Tax of human T-cell leukemia virus type 1 Tax-responsive element 1 (TRE-1), we developed a genetic approach with Saccharomyces cerevisiae. We constructed a yeast reporter strain containing the lacZ gene under the control of the CYC1 promoter associated with three copies of TRE-1. Expression of either the cyclic AMP response element-binding protein (CREB) or CREB fused to the GAL4 activation domain (GAD) in this strain did not modify the expression of the reporter gene. Tax alone was also inactive. """]})))
...
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_diseases", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val result = pipeline.fit(Seq.empty["Detection of various other intracellular signaling proteins is also described. Genetic characterization of transactivation of the human T-cell leukemia virus type 1 promoter: Binding of Tax to Tax-responsive element 1 is mediated by the cyclic AMP-responsive members of the CREB/ATF family of transcription factors. To achieve a better understanding of the mechanism of transactivation by Tax of human T-cell leukemia virus type 1 Tax-responsive element 1 (TRE-1), we developed a genetic approach with Saccharomyces cerevisiae. We constructed a yeast reporter strain containing the lacZ gene under the control of the CYC1 promoter associated with three copies of TRE-1. Expression of either the cyclic AMP response element-binding protein (CREB) or CREB fused to the GAL4 activation domain (GAD) in this strain did not modify the expression of the reporter gene. Tax alone was also inactive. "].toDS.toDF("text")).transform(data)

}

Results

+------------------------------+---------+
|chunk                         |ner      |
+------------------------------+---------+
|the cyst                      |Disease  |
|a large Prolene suture        |Disease  |
|a very small incisional hernia|Disease  |
|the hernia cavity             |Disease  |
|omentum                       |Disease  |
|the hernia                    |Disease  |
|the wound lesion              |Disease  |
|The lesion                    |Disease  |
|the existing scar             |Disease  |
|the cyst                      |Disease  |
|the wound                     |Disease  |
|this cyst down to its base    |Disease  |
|a small incisional hernia     |Disease  |
|The cyst                      |Disease  |
|The wound                     |Disease  |
+------------------------------+---------+

Model Information

Model Name: ner_diseases
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Data Source

Trained on i2b2 with embeddings_clinical. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

|    | label         |     tp |   fp |   fn |     prec |      rec |       f1 |
|---:|:--------------|-------:|-----:|-----:|---------:|---------:|---------:|
|  0 | I-Disease     |   5014 |  222 |  171 | 0.957601 | 0.96702  | 0.962288 |
|  1 | B-Disease     |   6004 |  213 |  159 | 0.965739 | 0.974201 | 0.969952 |
|  2 | Macro-average | 11018  | 435  |  330 | 0.96167  | 0.970611 | 0.96612  |
|  3 | Micro-average | 11018  | 435  |  330 | 0.962019 | 0.97092  | 0.966449 |