Pipeline to Detect Genes/Proteins (BC2GM) in Medical Text

Description

This pretrained pipeline is built on the top of bert_token_classifier_ner_bc2gm_gene model.

Predicted Entities

GENE/PROTEIN

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("bert_token_classifier_ner_bc2gm_gene_pipeline", "en", "clinical/models")

text = '''ROCK-I, Kinectin, and mDia2 can bind the wild type forms of both RhoA and Cdc42 in a GTP-dependent manner in vitro. These results support the hypothesis that in the presence of tryptophan the ribosome translating tnaC blocks Rho ' s access to the boxA and rut sites, thereby preventing transcription termination.'''

result = pipeline.fullAnnotate(text)

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("bert_token_classifier_ner_bc2gm_gene_pipeline", "en", "clinical/models")

val text = "ROCK-I, Kinectin, and mDia2 can bind the wild type forms of both RhoA and Cdc42 in a GTP-dependent manner in vitro. These results support the hypothesis that in the presence of tryptophan the ribosome translating tnaC blocks Rho ' s access to the boxA and rut sites, thereby preventing transcription termination."

val result = pipeline.fullAnnotate(text)

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("bert_token_classifier_ner_bc2gm_gene_pipeline", "en", "clinical/models")

text = '''ROCK-I, Kinectin, and mDia2 can bind the wild type forms of both RhoA and Cdc42 in a GTP-dependent manner in vitro. These results support the hypothesis that in the presence of tryptophan the ribosome translating tnaC blocks Rho ' s access to the boxA and rut sites, thereby preventing transcription termination.'''

result = pipeline.fullAnnotate(text)

Results

|    | ner_chunk   |   begin |   end | ner_label    |   confidence |
|---:|:------------|--------:|------:|:-------------|-------------:|
|  0 | ROCK-I      |       0 |     5 | GENE/PROTEIN |     0.999978 |
|  1 | Kinectin    |       8 |    15 | GENE/PROTEIN |     0.999973 |
|  2 | mDia2       |      22 |    26 | GENE/PROTEIN |     0.999974 |
|  3 | RhoA        |      65 |    68 | GENE/PROTEIN |     0.999976 |
|  4 | Cdc42       |      74 |    78 | GENE/PROTEIN |     0.999979 |
|  5 | tnaC        |     213 |   216 | GENE/PROTEIN |     0.999978 |
|  6 | Rho         |     225 |   227 | GENE/PROTEIN |     0.999976 |
|  7 | boxA        |     247 |   250 | GENE/PROTEIN |     0.999837 |
|  8 | rut sites   |     256 |   264 | GENE/PROTEIN |     0.99115  |

Model Information

Model Name:	bert_token_classifier_ner_bc2gm_gene_pipeline
Type:	pipeline
Compatibility:	Healthcare NLP 4.4.4+
License:	Licensed
Edition:	Official
Language:	en
Size:	404.8 MB

Included Models

DocumentAssembler
SentenceDetectorDLModel
TokenizerModel
MedicalBertForTokenClassifier
NerConverterInternalModel

PREVIOUSPipeline to Detect Anatomical Structures in Medical Text

NEXTPipeline to Detect Chemicals in Medical Text