Bert for Sequence Classification (Clinical Documents Sections, Headless)

Description

This is a BERT-based model for classification of clinical documents sections. This model is trained on clinical document sections without the section header in the text, e.g., when splitting the document with ChunkSentenceSplitter annotator with parameter setInsertChunk=False.

Predicted Entities

Consultation and Referral, Habits, Complications and Risk Factors, Diagnostic and Laboratory Data, Discharge Information, History, Impression, Patient Information, Procedures, Other

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")

sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections_headless", "en", "clinical/models")\
    .setInputCols(["document", "token"])\
    .setOutputCol("prediction")\
    .setCaseSensitive(False)

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier  
])

example_df = spark.createDataFrame(
        [["""It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """]]).toDF("text")


result = pipeline.fit(example_df).transform(example_df)
result.select("prediction.result").show(truncate=False)

val documentAssembler = new DocumentAssembler()
     .setInputCol("text")
     .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
   .setInputCols("sentence")
   .setOutputCol("token")

val seq = BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections_headless", "en", "clinical/models")
   .setInputCols(Array("token", "sentence"))
   .setOutputCol("label")
   .setCaseSensitive(false)

val pipeline = new Pipeline().setStages(Array(
       documentAssembler,
       sentenceDetector,
       tokenizer,
       seq))

val test_sentences = """It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """"

val example = Seq(test_sentences).toDF("text")
val result = pipeline.fit(example).transform(example)

import nlu

nlu.load("en.classify.bert_sequence.clinical_sections_headless").predict("""It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """)

Results

+-----------------------+
|result                 |
+-----------------------+
|[Discharge Information]|
+-----------------------+

Model Information

Model Name:	bert_sequence_classifier_clinical_sections_headless
Compatibility:	Healthcare NLP 5.1.4+
License:	Licensed
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[class]
Language:	en
Size:	406.6 MB
Case sensitive:	false
Max sentence length:	512

References

In-house annotation of clinical documents.

Sample text from the training dataset

It was a pleasure taking care of you! You came to us with stomach pain and worsening distension. While you were here we did a paracentesis to remove 1.5L of fluid from your belly. We also placed you on you 40 mg of Lasix and 50 mg of Aldactone to help you urinate the excess fluid still in your belly. As we discussed, everyone has a different dose of lasix required to make them urinate and it’s likely that you weren’t taking a high enough dose. Please take these medications daily to keep excess fluid off and eat a low salt diet. You will follow up with Dr. ___ in liver clinic and from there have your colonoscopy and EGD scheduled.

Benchmarking

                         label  precision    recall  f1-score   support
     Consultation_and_Referral   0.655949  0.890830  0.755556       229
                         Other   0.954545  0.933333  0.943820        45
                        Habits   0.872727  0.800000  0.834783        60
Complications_and_Risk_Factors   0.997468  0.989950  0.993695       398
Diagnostic_and_Laboratory_Data   0.887417  0.676768  0.767908       396
         Discharge_Information   0.792000  0.763496  0.777487       389
                       History   0.873810  0.910670  0.891859       403
                    Impression   0.843537  0.909535  0.875294       409
           Patient_Information   0.804569  0.786600  0.795483       403
                    Procedures   0.875912  0.865385  0.870617       416

PREVIOUSBert for Sequence Classification (Clinical Documents Sections)

NEXTPipeline to Detect Oncology-Specific Entities