Pipeline for HUGO Gene Nomenclature Committee (HGNC)

Description

This pipeline extracts GENE entities and maps them to their corresponding HUGO Gene Nomenclature Committee (HGNC) codes using sbiobert_base_cased_mli sentence embeddings.

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

hgnc_pipeline = PretrainedPipeline("hgnc_resolver_pipeline", "en", "clinical/models")

text = """During today's consultation, we reviewed the results of the comprehensive genetic analysis performed on the patient. This analysis uncovered complex interactions between several genes: DUX4, DUX4L20, FBXO48, MYOD1, and PAX7. These findings are significant as they provide new understanding of the molecular pathways that are involved in muscle differentiation and may play a role in the development and progression of muscular dystrophies in this patient."""

result = hgnc_pipeline.fullAnnotate(text)


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val hgnc_pipeline = PretrainedPipeline("hgnc_resolver_pipeline", "en", "clinical/models")

val text = """During today's consultation, we reviewed the results of the comprehensive genetic analysis performed on the patient. This analysis uncovered complex interactions between several genes: DUX4, DUX4L20, FBXO48, MYOD1, and PAX7. These findings are significant as they provide new understanding of the molecular pathways that are involved in muscle differentiation and may play a role in the development and progression of muscular dystrophies in this patient."""

val result = hgnc_pipeline.fullAnnotate(text)

Results


+-------+-----+---+----------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
| chunks|begin|end|      code|                                         all_codes|                                       resolutions|                                     all_distances|
+-------+-----+---+----------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+
|   DUX4|  185|188|HGNC:50800|[HGNC:50800, HGNC:3070, HGNC:32183, HGNC:38686,...|[DUX4 [double homeobox 4], DUSP4 [dual specific...|[0.0000, 0.0210, 0.0221, 0.0239, 0.0276, 0.0302...|
|DUX4L20|  191|197|HGNC:50801|[HGNC:50801, HGNC:39776, HGNC:31982, HGNC:26230...|[DUX4L20 [double homeobox 4 like 20 (pseudogene...|[0.0000, 0.0696, 0.0698, 0.0744, 0.0756, 0.0767...|
| FBXO48|  200|205|HGNC:33857|[HGNC:33857, HGNC:4930, HGNC:16653, HGNC:13114,...|[FBXO48 [F-box protein 48], ZBTB48 [zinc finger...|[0.0000, 0.0495, 0.0503, 0.0510, 0.0601, 0.0593...|
|  MYOD1|  208|212| HGNC:7611|[HGNC:7611, HGNC:13879, HGNC:7613, HGNC:7582, H...|[MYOD1 [myogenic differentiation 1], MYO1H [myo...|[0.0000, 0.0614, 0.0634, 0.0634, 0.0696, 0.0709...|
|   PAX7|  219|222| HGNC:8621|[HGNC:8621, HGNC:8748, HGNC:9351, HGNC:8792, HG...|[PAX7 [paired box 7], PCSK7 [proprotein convert...|[0.0000, 0.1042, 0.1036, 0.1046, 0.1056, 0.1053...|
+-------+-----+---+----------+--------------------------------------------------+--------------------------------------------------+--------------------------------------------------+

Model Information

Model Name: hgnc_resolver_pipeline
Type: pipeline
Compatibility: Healthcare NLP 5.2.1+
License: Licensed
Edition: Official
Language: en
Size: 2.4 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverterInternalModel
  • Chunk2Doc
  • BertSentenceEmbeddings
  • SentenceEntityResolverModel