Pipeline to Detect biological concepts (biobert)

Description

This pretrained pipeline is built on the top of ner_bionlp_biobert model.

Predicted Entities

Gene_or_gene_product, Cancer, Cell, Cellular_component, Organism, Multi-tissue_structure, Developing_anatomical_structure, Amino_acid, Organ, Anatomical_system, Tissue, Organism_subdivision, Simple_chemical, Organism_substance, Immaterial_anatomical_entity, Pathological_formation

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_bionlp_biobert_pipeline", "en", "clinical/models")

text = '''Both the erbA IRES and the erbA/myb virus constructs transformed erythroid cells after infection of bone marrow or blastoderm cultures. The erbA/myb IRES virus exhibited a 5-10-fold higher transformed colony forming efficiency than the erbA IRES virus in the blastoderm assay'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("ner_bionlp_biobert_pipeline", "en", "clinical/models")

val text = "Both the erbA IRES and the erbA/myb virus constructs transformed erythroid cells after infection of bone marrow or blastoderm cultures. The erbA/myb IRES virus exhibited a 5-10-fold higher transformed colony forming efficiency than the erbA IRES virus in the blastoderm assay"

val result = pipeline.fullAnnotate(text)
import nlu
nlu.load("en.med_ner.bionlp_biobert.pipeline").predict("""Both the erbA IRES and the erbA/myb virus constructs transformed erythroid cells after infection of bone marrow or blastoderm cultures. The erbA/myb IRES virus exhibited a 5-10-fold higher transformed colony forming efficiency than the erbA IRES virus in the blastoderm assay""")

Results

|    | ner_chunk           |   begin |   end | ner_label              |   confidence |
|---:|:--------------------|--------:|------:|:-----------------------|-------------:|
|  0 | erbA                |       9 |    12 | Gene_or_gene_product   |      1       |
|  1 | IRES                |      14 |    17 | Organism               |      0.754   |
|  2 | virus               |      36 |    40 | Organism               |      0.9999  |
|  3 | erythroid cells     |      65 |    79 | Cell                   |      0.99855 |
|  4 | bone                |     100 |   103 | Multi-tissue_structure |      0.9794  |
|  5 | marrow              |     105 |   110 | Multi-tissue_structure |      0.9631  |
|  6 | blastoderm cultures |     115 |   133 | Cell                   |      0.9868  |
|  7 | IRES virus          |     149 |   158 | Organism               |      0.99985 |
|  8 | erbA                |     236 |   239 | Gene_or_gene_product   |      0.9977  |
|  9 | IRES virus          |     241 |   250 | Organism               |      0.9911  |
| 10 | blastoderm          |     259 |   268 | Cell                   |      0.9941  |

Model Information

Model Name: ner_bionlp_biobert_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Language: en
Size: 422.2 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • BertEmbeddings
  • MedicalNerModel
  • NerConverterInternalModel