Extract Clinical Entities from Voice of the Patient Documents (embeddings_clinical)

Description

This model extracts healthcare-related terms from the documents transferred from the patient’s own sentences.

Predicted Entities

Gender, Employment, Age, BodyPart, Substance, Form, PsychologicalCondition, Vaccine, Drug, DateTime, ClinicalDept, Laterality, Test, AdmissionDischarge, Disease, VitalTest, Dosage, Duration, RelationshipStatus, Route, Allergen, Frequency, Symptom, Procedure, HealthStatus, InjuryOrPoisoning, Modifier, Treatment, SubstanceQuantity, MedicalDevice, TestResult

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")                

ner = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverterInternal() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter])

data = spark.createDataFrame([["Hello,I'm 20 year old girl. I'm diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss,  from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I'm taking weekly supplement of vitamin D and 1000 mcg b12 daily. I'm taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I'm facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you."]]).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence")
    
val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")
    
val word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")                
    
val ner = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
    
val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

        
val pipeline = new Pipeline().setStages(Array(document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter))    

val data = Seq("Hello,I'm 20 year old girl. I'm diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss,  from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I'm taking weekly supplement of vitamin D and 1000 mcg b12 daily. I'm taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I'm facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you.").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)

Results

| chunk                | ner_label              |
|:---------------------|:-----------------------|
| 20 year old          | Age                    |
| girl                 | Gender                 |
| hyperthyroid         | Disease                |
| 1 month ago          | DateTime               |
| weak                 | Symptom                |
| light                | Symptom                |
| panic attacks        | PsychologicalCondition |
| depression           | PsychologicalCondition |
| left                 | Laterality             |
| chest                | BodyPart               |
| pain                 | Symptom                |
| increased            | TestResult             |
| heart rate           | VitalTest              |
| rapidly              | Modifier               |
| weight loss          | Symptom                |
| 4 months             | Duration               |
| hospital             | ClinicalDept           |
| discharged           | AdmissionDischarge     |
| hospital             | ClinicalDept           |
| blood tests          | Test                   |
| brain                | BodyPart               |
| mri                  | Test                   |
| ultrasound scan      | Test                   |
| endoscopy            | Procedure              |
| doctors              | Employment             |
| homeopathy doctor    | Employment             |
| he                   | Gender                 |
| hyperthyroid         | Disease                |
| TSH                  | Test                   |
| 0.15                 | TestResult             |
| T3                   | Test                   |
| T4                   | Test                   |
| normal               | TestResult             |
| b12 deficiency       | Disease                |
| vitamin D deficiency | Disease                |
| weekly               | Frequency              |
| supplement           | Drug                   |
| vitamin D            | Drug                   |
| 1000 mcg             | Dosage                 |
| b12                  | Drug                   |
| daily                | Frequency              |
| homeopathy medicine  | Drug                   |
| 40 days              | Duration               |
| 2nd test             | Test                   |
| after 30 days        | DateTime               |
| TSH                  | Test                   |
| 0.5                  | TestResult             |
| now                  | DateTime               |
| weakness             | Symptom                |
| depression           | PsychologicalCondition |
| last week            | DateTime               |
| rapid heartrate      | Symptom                |
| allopathy medicine   | Treatment              |
| homeopathy           | Treatment              |
| thyroid              | BodyPart               |
| allopathy            | Treatment              |

Model Information

Model Name: ner_vop
Compatibility: Healthcare NLP 4.4.3+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 3.9 MB
Dependencies: embeddings_clinical

References

In-house annotated health-related text in colloquial language.

Sample text from the training dataset

Hello,I’m 20 year old girl. I’m diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem. Finally I got an appointment with a homeopathy doctor finally he find that i was suffering from hyperthyroid and my TSH was 0.15 T3 and T4 is normal . Also i have b12 deficiency and vitamin D deficiency so I’m taking weekly supplement of vitamin D and 1000 mcg b12 daily. I’m taking homeopathy medicine for 40 days and took 2nd test after 30 days. My TSH is 0.5 now. I feel a little bit relief from weakness and depression but I’m facing with 2 new problem from last week that is breathtaking problem and very rapid heartrate. I just want to know if i should start allopathy medicine or homeopathy is okay? Bcs i heard that thyroid take time to start recover. So please let me know if both of medicines take same time. Because some of my friends advising me to start allopathy and never take a chance as i can develop some serious problems.Sorry for my poor english😐Thank you.

Benchmarking

                 label    tp   fp   fn  total  precision  recall   f1
                Gender  1291   19   26   1317       0.99    0.98 0.98
            Employment  1182   56   61   1243       0.95    0.95 0.95
                   Age   544   50   38    582       0.92    0.93 0.93
              BodyPart  2711  219  189   2900       0.93    0.93 0.93
             Substance   390   45   31    421       0.90    0.93 0.91
                  Form   246   30   20    266       0.89    0.92 0.91
PsychologicalCondition   403   34   41    444       0.92    0.91 0.91
               Vaccine    37    4    5     42       0.90    0.88 0.89
                  Drug  1330  208  110   1440       0.86    0.92 0.89
              DateTime  4045  690  357   4402       0.85    0.92 0.89
          ClinicalDept   277   24   49    326       0.92    0.85 0.88
            Laterality   550   66   78    628       0.89    0.88 0.88
                  Test  1063  158  145   1208       0.87    0.88 0.88
    AdmissionDischarge    28    2    6     34       0.93    0.82 0.88
               Disease  1706  247  309   2015       0.87    0.85 0.86
             VitalTest   143   19   29    172       0.88    0.83 0.86
                Dosage   333   38   79    412       0.90    0.81 0.85
              Duration  1897  320  413   2310       0.86    0.82 0.84
    RelationshipStatus    19    2    5     24       0.90    0.79 0.84
                 Route    39    7    9     48       0.85    0.81 0.83
              Allergen    33    1   13     46       0.97    0.72 0.83
             Frequency   905  224  174   1079       0.80    0.84 0.82
               Symptom  3813  973  762   4575       0.80    0.83 0.81
             Procedure   556  111  149    705       0.83    0.79 0.81
          HealthStatus    77   15   30    107       0.84    0.72 0.77
     InjuryOrPoisoning   131   42   45    176       0.76    0.74 0.75
              Modifier   837  254  302   1139       0.77    0.73 0.75
             Treatment   164   50   64    228       0.77    0.72 0.74
     SubstanceQuantity    58   18   27     85       0.76    0.68 0.72
         MedicalDevice   232   86  100    332       0.73    0.70 0.71
            TestResult   321   85  203    524       0.79    0.61 0.69
             macro_avg 25361 4097 3869  29230       0.86    0.83 0.84
             micro_avg 25361 4097 3869  29230       0.86    0.87 0.86