HPO Code To Gene, Gene To Disease Mapping

Description

This pretrained model maps HPO codes to their associated genes and further maps those genes to related diseases.

Copy S3 URI

How to use


document_assembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

chunk_assembler = Doc2Chunk()\
      .setInputCols(["document"])\
      .setOutputCol("hpo_code")

mapperModel = ChunkMapperModel.pretrained("hpo_code_gene_disease_mapper", "en", "clinical/models")\
    .setInputCols(["hpo_code"])\
    .setOutputCol("mappings")\
    .setRels(["hpo_gene_disease"])

mapper_pipeline = Pipeline(stages=[
    document_assembler,
    chunk_assembler,
    mapperModel
])

data = spark.createDataFrame([["HP:0000002"],["HP:6001080"],["HP:0009484"]]).toDF("text")

result = mapper_pipeline.fit(data).transform(data)


document_assembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

chunk_assembler = nlp.Doc2Chunk()\
      .setInputCols(["document"])\
      .setOutputCol("hpo_code")

mapperModel = medical.ChunkMapperModel.pretrained("hpo_code_gene_disease_mapper", "en", "clinical/models")\
    .setInputCols(["hpo_code"])\
    .setOutputCol("mappings")\
    .setRels(["hpo_gene_disease"])

mapper_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    chunk_assembler,
    mapperModel
])

data = spark.createDataFrame([["HP:0000002"],["HP:6001080"],["HP:0009484"]]).toDF("text")

result = mapper_pipeline.fit(data).transform(data)


val document_assembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")

val chunk_assembler = new Doc2Chunk()
      .setInputCols("document")
      .setOutputCol("hpo_code")

val mapperModel = ChunkMapperModel.pretrained("hpo_code_gene_disease_mapper", "en", "clinical/models")
    .setInputCols("hpo_code")
    .setOutputCol("mappings")
    .setRels(Array("hpo_gene_disease"))

val mapper_pipeline = new Pipeline().setStages(Array(
    document_assembler,
    chunk_assembler,
    mapperModel
))


val data = Seq(("HP:0000002"),("HP:6001080"),("HP:0009484")).toDF("text")

val result = mapper_pipeline.fit(data).transform(data)

Results


+----------+------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
|  hpo_code|                                                                                                            gene_disease|                                                                                                       all_k_resolutions|
+----------+------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
|HP:0000002|{"DUSP6": ["eunuchoid habitus", "gait disturbance", "seizure", "hypotonia", "ataxia", "dysarthria", "decreased testic...|{"DUSP6": ["eunuchoid habitus", "gait disturbance", "seizure", "hypotonia", "ataxia", "dysarthria", "decreased testic...|
|HP:6001080|{"HSD11B1": ["autosomal dominant inheritance", "low tetrahydrocortisol (thf) plus 5-alpha-thf/tetrahydrocortisone (th...|{"HSD11B1": ["autosomal dominant inheritance", "low tetrahydrocortisol (thf) plus 5-alpha-thf/tetrahydrocortisone (th...|
|HP:0009484|{"SHH": ["abnormal thumb morphology", "hand polydactyly", "poor speech", "expressive language delay", "limb dystonia"...|{"SHH": ["abnormal thumb morphology", "hand polydactyly", "poor speech", "expressive language delay", "limb dystonia"...|
+----------+------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+

Model Information

Model Name: hpo_code_gene_disease_mapper
Compatibility: Healthcare NLP 6.0.4+
License: Licensed
Edition: Official
Input Labels: [ner_chunk]
Output Labels: [mappings]
Language: en
Size: 113.2 MB