Pipeline to Detect Chemicals in Medical Text

Description

This pretrained pipeline is built on the top of bert_token_classifier_ner_bc4chemd_chemicals model.

Predicted Entities

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("bert_token_classifier_ner_bc4chemd_chemicals_pipeline", "en", "clinical/models")

text = '''The main isolated compounds were triterpenes (alpha - amyrin, beta - amyrin, lupeol, betulin, betulinic acid, uvaol, erythrodiol and oleanolic acid) and phenolic acid derivatives from 4 - hydroxybenzoic acid (gallic and protocatechuic acids and isocorilagin).'''

result = pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = new PretrainedPipeline("bert_token_classifier_ner_bc4chemd_chemicals_pipeline", "en", "clinical/models")

val text = "The main isolated compounds were triterpenes (alpha - amyrin, beta - amyrin, lupeol, betulin, betulinic acid, uvaol, erythrodiol and oleanolic acid) and phenolic acid derivatives from 4 - hydroxybenzoic acid (gallic and protocatechuic acids and isocorilagin)."

val result = pipeline.fullAnnotate(text)

Results

|    | ner_chunk                       |   begin |   end | ner_label   |   confidence |
|---:|:--------------------------------|--------:|------:|:------------|-------------:|
|  0 | triterpenes                     |      33 |    43 | CHEM        |     0.99999  |
|  1 | alpha - amyrin                  |      46 |    59 | CHEM        |     0.999939 |
|  2 | beta - amyrin                   |      62 |    74 | CHEM        |     0.999679 |
|  3 | lupeol                          |      77 |    82 | CHEM        |     0.999968 |
|  4 | betulin                         |      85 |    91 | CHEM        |     0.999975 |
|  5 | betulinic acid                  |      94 |   107 | CHEM        |     0.999984 |
|  6 | uvaol                           |     110 |   114 | CHEM        |     0.99998  |
|  7 | erythrodiol                     |     117 |   127 | CHEM        |     0.999987 |
|  8 | oleanolic acid                  |     133 |   146 | CHEM        |     0.999984 |
|  9 | phenolic acid                   |     153 |   165 | CHEM        |     0.999985 |
| 10 | 4 - hydroxybenzoic acid         |     184 |   206 | CHEM        |     0.999973 |
| 11 | gallic and protocatechuic acids |     209 |   239 | CHEM        |     0.999984 |
| 12 | isocorilagin                    |     245 |   256 | CHEM        |     0.999985 |

Model Information

Model Name: bert_token_classifier_ner_bc4chemd_chemicals_pipeline
Type: pipeline
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Language: en
Size: 404.8 MB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • MedicalBertForTokenClassifier
  • NerConverterInternalModel