Description
This model maps extracted medical entities to ICD-O codes using chunk embeddings.
Predicted Entities
ICD-O Codes and their normalized definition with sbiobert_base_cased_mli
embeddings.
How to use
...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")
icdo_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icdo","en", "clinical/models") \
.setInputCols(["ner_chunk", "sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icdo_resolver])
model = nlpPipeline.fit(spark.createDataFrame([["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."]]).toDF("text"))
results = model.transform(data)
...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")
val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli",'en','clinical/models')
.setInputCols(Array("ner_chunk_doc"))
.setOutputCol("sbert_embeddings")
val icdo_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icdo","en", "clinical/models")
.setInputCols(Array("ner_chunk", "sbert_embeddings"))
.setOutputCol("resolution")
.setDistanceFunction("EUCLIDEAN")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, chunk2doc, sbert_embedder, icdo_resolver))
val result = pipeline.fit(Seq.empty["This is an 82 - year-old male with a history of prior tobacco use , hypertension , chronic renal insufficiency , COPD , gastritis , and TIA who initially presented to Braintree with a non-ST elevation MI and Guaiac positive stools , transferred to St . Margaret\'s Center for Women & Infants for cardiac catheterization with PTCA to mid LAD lesion complicated by hypotension and bradycardia requiring Atropine , IV fluids and transient dopamine possibly secondary to vagal reaction , subsequently transferred to CCU for close monitoring , hemodynamically stable at the time of admission to the CCU ."].toDS.toDF("text")).transform(data)
Results
+--------------------+-----+---+---------+------+----------+--------------------+--------------------+
| chunk|begin|end| entity| code|confidence| resolutions| codes|
+--------------------+-----+---+---------+------+----------+--------------------+--------------------+
| hypertension| 68| 79| PROBLEM|8312/3| 0.3558|Renal cell carcin...|8312/3:::9964/3::...|
|chronic renal ins...| 83|109| PROBLEM|9980/3| 0.5290|Refractory anemia...|9980/3:::8312/3::...|
| COPD| 113|116| PROBLEM|9950/3| 0.2092|Polycythemia vera...|9950/3:::8141/3::...|
| gastritis| 120|128| PROBLEM|8150/3| 0.1795|Islet cell carcin...|8150/3:::8153/3::...|
| TIA| 136|138| PROBLEM|9570/0| 0.4843|Neuroma, NOS:::Ca...|9570/0:::8692/3::...|
|a non-ST elevatio...| 182|202| PROBLEM|8343/2| 0.1914|Non-invasive EFVP...|8343/2:::9150/0::...|
|Guaiac positive s...| 208|229| PROBLEM|8155/3| 0.1069|Vipoma:::Myeloid ...|8155/3:::9930/3::...|
|cardiac catheteri...| 295|317| TEST|8045/3| 0.1144|Combined small ce...|8045/3:::9705/3::...|
| PTCA| 324|327|TREATMENT|9735/3| 0.0924|Plasmablastic lym...|9735/3:::9365/3::...|
| mid LAD lesion| 332|345| PROBLEM|9383/1| 0.0845|Subependymoma:::D...|9383/1:::8806/3::...|
+--------------------+-----+---+---------+------+----------+--------------------+--------------------+
Model Information
Name: | sbiobertresolve_icdo |
Type: | SentenceEntityResolverModel |
Compatibility: | Spark NLP 2.6.4 + |
License: | Licensed |
Edition: | Official |
Input labels: | [ner_chunk, chunk_embeddings] |
Output labels: | [resolution] |
Language: | en |
Dependencies: | sbiobert_base_cased_mli |
Data Source
Trained on ICD-O Histology Behaviour dataset with sbiobert_base_cased_mli
sentence embeddings.
https://apps.who.int/iris/bitstream/handle/10665/96612/9789241548496_eng.pdf