Entity Resolver for Human Phenotype Ontology

Description

This model maps phenotypic abnormalities encountered in human diseases to Human Phenotype Ontology (HPO) codes using sbiobert_base_cased_mli Sentence Bert Embeddings, and has faster load time, with a speedup of about 6X when compared to previous versions. Also the load process now is more memory friendly meaning that the maximum memory required during load time is smaller, reducing the chances of OOM exceptions, and thus relaxing hardware requirements.

Predicted Entities

This model returns Human Phenotype Ontology (HPO) codes for phenotypic abnormalities encountered in human diseases. It also returns associated codes from the following vocabularies for each HPO code: - MeSH (Medical Subject Headings)- SNOMED- UMLS (Unified Medical Language System ) - ORPHA (international reference resource for information on rare diseases and orphan drugs) - OMIM (Online Mendelian Inheritance in Man)

Live Demo Open in Colab Download

How to use

...
chunk2doc = Chunk2Doc().setInputCols("ner_chunk").setOutputCol("ner_chunk_doc")

sbert_embedder = BertSentenceEmbeddings\
     .pretrained("sbiobert_base_cased_mli",'en','clinical/models')\
     .setInputCols(["ner_chunk_doc"])\
     .setOutputCol("sbert_embeddings")

resolver = SentenceEntityResolverModel\
     .pretrained("sbiobertresolve_HPO", "en", "clinical/models") \
     .setInputCols(["ner_chunk", "sbert_embeddings"]) \
     .setOutputCol("resolution")\
     .setDistanceFunction("EUCLIDEAN")

pipeline = Pipeline(stages = [document_assembler, sentence_detector, tokens, embeddings, ner, ner_converter, chunk2doc, sbert_embedder, resolver])

model = LightPipeline(pipeline.fit(spark.createDataFrame([[""]], ["text"])))

text="""These disorders include cancer, bipolar disorder, schizophrenia, autism, Cri-du-chat syndrome, myopia, cortical cataract-linked Alzheimer's disease, and infectious diseases"""

res = model.fullAnnotate(text)

Results

|    | chunk            | entity   | resolution   | aux_codes                                                                    |
|---:|:-----------------|:---------|:-------------|:-----------------------------------------------------------------------------|
|  0 | cancer           | HP       | HP:0002664   | MSH:D009369||SNOMED:108369006,363346000||UMLS:C0006826,C0027651||ORPHA:1775  |
|  1 | bipolar disorder | HP       | HP:0007302   | MSH:D001714||SNOMED:13746004||UMLS:C0005586||ORPHA:370079                    |
|  2 | schizophrenia    | HP       | HP:0100753   | MSH:D012559||SNOMED:191526005,58214004||UMLS:C0036341||ORPHA:231169          |
|  3 | autism           | HP       | HP:0000717   | MSH:D001321||SNOMED:408856003,408857007,43614003||UMLS:C0004352||ORPHA:79279 |
|  4 | myopia           | HP       | HP:0000545   | MSH:D009216||SNOMED:57190000||UMLS:C0027092||ORPHA:370022                    |

Model Information

Model Name: sbiobertresolve_HPO
Compatibility: Spark NLP for Healthcare 3.0.4+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [hpo_code]
Language: en
Case sensitive: false