Bert for Sequence Classification (Clinical Documents Sections)

Description

This is a BERT-based model for classification of clinical documents sections. This model performs better when the section header is present in the text, e.g., when splitting the document with ChunkSentenceSplitter annotator with parameter setInsertChunk=True.

Predicted Entities

Complications and Risk Factors, Consultation and Referral, Diagnostic and Laboratory Data, Discharge Information, Habits, History, Patient Information, Procedures, Impression, Other

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

tokenizer = nlp.Tokenizer() \
    .setInputCols(["document"]) \
    .setOutputCol("token")

sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections", "en", "clinical/models")\
    .setInputCols(["document", "token"])\
    .setOutputCol("prediction")\
    .setCaseSensitive(False)

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier  
])

example_df = spark.createDataFrame(
        [["""Discharge Instructions:
It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """]]).toDF("text")


result = spark_model.transform(example_df)
result.select("prediction.result").show(truncate=False)
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val seq = BertForSequenceClassification.pretrained("bert_sequence_classifier_clinical_sections", "en", "clinical/models")
    .setInputCols(Array("token", "sentence"))
    .setOutputCol("label")
    .setCaseSensitive(false)

val pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
seq))

val test_sentences = """Discharge Instructions:
It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """"

val example = Seq(test_sentences).toDF("text")
val result = pipeline.fit(example).transform(example)
import nlu

nlu.load("en.classify.bert_sequence.clinical_sections").predict("""Discharge Instructions:
It was a pleasure taking care of you! You came to us with 
stomach pain and worsening distension. While you were here we 
did a paracentesis to remove 1.5L of fluid from your belly. We 
also placed you on you 40 mg of Lasix and 50 mg of Aldactone to 
help you urinate the excess fluid still in your belly. As we 
discussed, everyone has a different dose of lasix required to 
make them urinate and it's likely that you weren't taking a high 
enough dose. Please take these medications daily to keep excess 
fluid off and eat a low salt diet. You will follow up with Dr. 
___ in liver clinic and from there have your colonoscopy 
and EGD scheduled. """)

Results

+-----------------------+
|result                 |
+-----------------------+
|[Discharge Information]|
+-----------------------+

Model Information

Model Name: bert_sequence_classifier_clinical_sections
Compatibility: Healthcare NLP 5.1.4+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 406.6 MB
Case sensitive: false
Max sentence length: 512

References

In-house annotation of clinical documents.

Sample text from the training dataset

Discharge Instructions: It was a pleasure taking care of you! You came to us with stomach pain and worsening distension. While you were here we did a paracentesis to remove 1.5L of fluid from your belly. We also placed you on you 40 mg of Lasix and 50 mg of Aldactone to help you urinate the excess fluid still in your belly. As we discussed, everyone has a different dose of lasix required to make them urinate and it’s likely that you weren’t taking a high enough dose. Please take these medications daily to keep excess fluid off and eat a low salt diet. You will follow up with Dr. ___ in liver clinic and from there have your colonoscopy and EGD scheduled.

Benchmarking

                         label  precision    recall  f1-score   support
     Consultation_and_Referral   0.981203  0.996183  0.988636       262
                         Other   1.000000  1.000000  1.000000        29
                        Habits   0.983051  1.000000  0.991453        58
Complications_and_Risk_Factors   1.000000  1.000000  1.000000       385
Diagnostic_and_Laboratory_Data   0.987835  0.983051  0.985437       413
         Discharge_Information   0.992386  0.982412  0.987374       398
                       History   1.000000  0.990099  0.995025       404
                    Impression   0.997706  0.997706  0.997706       436
           Patient_Information   0.994764  0.994764  0.994764       382
                    Procedures   0.984456  0.997375  0.990874       381
                      accuracy          -         -  0.992694      3148
                     macro-avg   0.992140  0.994159  0.993127      3148
                  weighted-avg   0.992730  0.992694  0.992694      3148