Detect genes and human phenotypes

Description

This model detects mentions of genes and human phenotypes (hp) in medical text.

Predicted Entities:

GENE, HP

Live Demo Open in ColabDownload

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel. Add the NerConverter to the end of the pipeline to convert entity tokens into full entity chunks.


clinical_ner = NerDLModel.pretrained("ner_human_phenotype_gene_clinical", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate("Here we presented a case (BS type) of a 17 years old female presented with polyhydramnios, polyuria, nephrocalcinosis and hypokalemia, which was alleviated after treatment with celecoxib and vitamin D(3).")

Results

+----+------------------+---------+-------+----------+
|    | chunk            |   begin |   end | entity   |
+====+==================+=========+=======+==========+
|  0 | BS type          |      29 |    32 | GENE     |
+----+------------------+---------+-------+----------+
|  1 | polyhydramnios   |      75 |    88 | HP       |
+----+------------------+---------+-------+----------+
|  2 | polyuria         |      91 |    98 | HP       |
+----+------------------+---------+-------+----------+
|  3 | nephrocalcinosis |     101 |   116 | HP       |
+----+------------------+---------+-------+----------+
|  4 | hypokalemia      |     122 |   132 | HP       |
+----+------------------+---------+-------+----------+

Model Information

Model Name: ner_human_phenotype_gene_clinical
Type: ner
Compatibility: Spark NLP for Healthcare 2.6.0 +
Edition: Official
License: Licensed
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: [en]
Case sensitive: false