Description
This model maps clinical entities and concepts to ICD10 CM codes using sbiobert_base_cased_mli
Sentence Bert Embeddings. In this model, synonyms having low cosine similarity to unnormalized terms are dropped, making the model slim. It also returns the official resolution text within the brackets inside the metadata
Predicted Entities
ICD10 CM Codes. In this model, synonyms having low cosine similarity to unnormalized terms are dropped . It also returns the official resolution text within the brackets inside the metadata
Live Demo Open in Colab Download
How to use
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sbert_embeddings")
icd10_resolver = SentenceEntityResolverModel\
.pretrained("sbiobertresolve_icd10cm_slim_normalized","en", "clinical/models")\
.setInputCols(["document", "sbert_embeddings"])\
.setOutputCol("icd10cm_code")\
.setDistanceFunction("EUCLIDEAN")\
.setReturnCosineDistances(True)
bert_pipeline_icd = Pipeline(stages = [document_assembler, sbert_embedder, icd10_resolver])
data = spark.createDataFrame([["metastatic lung cancer"]]).toDF("text")
results = bert_pipeline_icd.fit(data).transform(data)
val document_assembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sbert_embedder = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sbert_embeddings")
val icd10_resolver = SentenceEntityResolverModel
.pretrained("sbiobertresolve_icd10cm_augmented_billable_hcc","en", "clinical/models")
.setInputCols(Array("document", "sbert_embeddings"))
.setOutputCol("icd10cm_code")
.setDistanceFunction("EUCLIDEAN")
.setReturnCosineDistances(True)
val bert_pipeline_icd = new Pipeline().setStages(Array(document_assembler, sbert_embedder, icd10_resolver))
val data = Seq("metastatic lung cancer").toDF("text")
val result = bert_pipeline_icd.fit(data).transform(data)
Results
| | chunks | code | resolutions | all_codes | billable_hcc_status_score | all_distances |
|---:|:-----------------------|:-------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|:----------------------------|:-------------------------------------------------------------------------------------------------------------------------|
| 0 | metastatic lung cancer | C7800 | ['cancer metastatic to lung', 'metastasis from malignant tumor of lung', 'cancer metastatic to left lung', 'history of cancer metastatic to lung', 'metastatic cancer', 'history of cancer metastatic to lung (situation)', 'metastatic adenocarcinoma to bilateral lungs', 'cancer metastatic to chest wall', 'metastatic malignant neoplasm to left lower lobe of lung', 'metastatic carcinoid tumour', 'cancer metastatic to respiratory tract', 'metastatic carcinoid tumor'] | ['C7800', 'C349', 'C7801', 'Z858', 'C800', 'Z8511', 'C780', 'C798', 'C7802', 'C799', 'C7830', 'C7B00'] | ['1', '1', '8'] | ['0.0464', '0.0829', '0.0852', '0.0860', '0.0914', '0.0989', '0.1133', '0.1220', '0.1220', '0.1253', '0.1249', '0.1260'] |
Model Information
Model Name: | sbiobertresolve_icd10cm_slim_normalized |
Compatibility: | Spark NLP for Healthcare 3.0.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [icd10cm_code] |
Language: | en |
Case sensitive: | false |