Sentence Entity Resolver for CPT codes (Augmented)

Description

This model maps extracted medical entities to CPT codes using sbiobert_base_cased_mli Sentence Bert Embeddings. This model is enriched with augmented data for better performance.

Predicted Entities

CPT codes and their descriptions.

Live Demo Open in Colab

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
         
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_clinical_large", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter()\
 	  .setInputCols(["sentence", "token", "ner"])\
 	  .setOutputCol("ner_chunk")

chunk2doc = Chunk2Doc()\
    .setInputCols("ner_chunk")\
    .setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
    .setInputCols(["ner_chunk_doc"])\
    .setOutputCol("sbert_embeddings")

resolver = SentenceEntityResolverModel.load("sbiobertresolve_cpt_augmented") \
    .setInputCols(["sbert_embeddings"]) \
    .setOutputCol("resolution")\
    .setDistanceFunction("EUCLIDEAN")


nlpPipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector, 
    tokenizer, 
    word_embeddings, 
    clinical_ner, 
    ner_converter, 
    chunk2doc, 
    sbert_embedder, 
    resolver])

data = spark.createDataFrame([["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."]]).toDF("text")

results = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
	.setInputCols("document")
	.setOutputCol("sentence")

val tokenizer = new Tokenizer()
	.setInputCols("sentence")
	.setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_clinical_large", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val chunk2doc = new Chunk2Doc()
    .setInputCols("ner_chunk")
    .setOutputCol("ner_chunk_doc")

val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli","en","clinical/models")
    .setInputCols("ner_chunk_doc")
    .setOutputCol("sbert_embeddings")

val resolver = SentenceEntityResolverModel.load("sbiobertresolve_cpt_augmented")
    .setInputCols(Array("sbert_embeddings"))
    .setOutputCol("resolution")
    .setDistanceFunction("EUCLIDEAN")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icdo_resolver))

val data = Seq("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.resolve.cpt.augmented").predict("""This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU .""")

Results

+--------------------+-----+---+-------+-----+----------+--------------------+--------------------+
|               chunk|begin|end| entity| code|confidence|   all_k_resolutions|         all_k_codes|
+--------------------+-----+---+-------+-----+----------+--------------------+--------------------+
|        hypertension|   68| 79|PROBLEM|36440|    0.3349|Hypertransfusion:...|36440:::24935:::0...|
|chronic renal ins...|   83|109|PROBLEM|50395|    0.0821|Nephrostomy:::Ren...|50395:::50328:::5...|
|                COPD|  113|116|PROBLEM|32960|    0.1575|Lung collapse pro...|32960:::32215:::1...|
|           gastritis|  120|128|PROBLEM|43501|    0.1772|Gastric ulcer sut...|43501:::43631:::4...|
|                 TIA|  136|138|PROBLEM|61460|    0.1432|Intracranial tran...|61460:::64742:::2...|
|a non-ST elevatio...|  182|202|PROBLEM|61624|    0.1151|Percutaneous non-...|61624:::61626:::3...|
|Guaiac positive s...|  208|229|PROBLEM|44005|    0.1115|Enterolysis:::Abd...|44005:::49080:::4...|
|      mid LAD lesion|  332|345|PROBLEM|0281T|    0.2407|Plication of left...|0281T:::93462:::9...|
|         hypotension|  362|372|PROBLEM|99135|    0.9935|Induced hypotensi...|99135:::99185:::9...|
|         bradycardia|  378|388|PROBLEM|99135|    0.3884|Induced hypotensi...|99135:::33305:::3...|
|      vagal reaction|  466|479|PROBLEM|55450|    0.1427|Vasoligation:::Va...|55450:::64408:::7...|
+--------------------+-----+---+-------+-----+----------+--------------------+--------------------+

Model Information

Model Name: sbiobertresolve_cpt_augmented
Compatibility: Healthcare NLP 3.0.4+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [cpt_code_aug]
Language: en
Case sensitive: false

References

CPT resolver models are removed from the Models Hub due to license restrictions and can only be shared with the users who already have a valid CPT license. If you possess one and wish to use this model, kindly contact us at support@johnsnowlabs.com.