Description
This pretrained model maps ICD-10-CM codes to their generalised 3-digit ICD-10-CM codes and the main concepts.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("ner_chunk")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
.setInputCols(["ner_chunk"])\
.setOutputCol("sbert_embeddings")\
.setCaseSensitive(False)
icd10_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm_augmented", "en", "clinical/models")\
.setInputCols(["sbert_embeddings"])\
.setOutputCol("icd10_code")\
.setDistanceFunction("EUCLIDEAN")
resolver2chunk = Resolution2Chunk()\
.setInputCols(["icd10_code"])\
.setOutputCol("icd102chunk")
chunkMapper = ChunkMapperModel.pretrained("icd10cm_generalised_mapper", "en", "clinical/models")\
.setInputCols(["icd102chunk"])\
.setOutputCol("mappings")\
pipeline = Pipeline(stages = [
documentAssembler,
sbert_embedder,
icd10_resolver,
resolver2chunk,
chunkMapper])
data = spark.createDataFrame([["gestational diabetes mellitus"],["Chronic obstructive pulmonary disease"]]).toDF("text")
mapper_model = pipeline.fit(data)
result= mapper_model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("ner_chunk")
val sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")
.setInputCols(Array("ner_chunk"))
.setOutputCol("sbert_embeddings")
.setCaseSensitive(false)
val icd10_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_icd10cm_augmented","en","clinical/models")
.setInputCols(Array("sbert_embeddings"))
.setOutputCol("icd10_code")
.setDistanceFunction("EUCLIDEAN")
val resolver2chunk = new Resolution2Chunk()
.setInputCols(Array("icd10_code"))
.setOutputCol("icd102chunk")
val chunkMapper = ChunkMapperModel.pretrained("icd10cm_generalised_mapper","en","clinical/models")
.setInputCols(Array("icd102chunk"))
.setOutputCol("mappings")
val Pipeline(stages = Array(
documentAssembler,
sbert_embedder,
icd10_resolver,
resolver2chunk,
chunkMapper))
val data = Seq("gestational diabetes mellitus"),Array("Chronic obstructive pulmonary disease") .toDF("text")
val mapper_model = pipeline.fit(data)
result= mapper_model.transform(data)
Results
+-------------------------------------+------------+--------------------------------------------+
|chunk |icd10cm_code|generalised_code |
+-------------------------------------+------------+--------------------------------------------+
|gestational diabetes mellitus |O24.4 |O24:Pregnancy, childbirth and the puerperium|
|Chronic obstructive pulmonary disease|J44.9 |J44:Diseases of the respiratory system |
+-------------------------------------+------------+--------------------------------------------+
Model Information
Model Name: | icd10cm_generalised_mapper |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [ner_chunk] |
Output Labels: | [mappings] |
Language: | en |
Size: | 1.3 MB |