Detect Substance Usage Entities (SUBSTANCE_USE)

Description

This pipeline can be used to extracts substance usage information in medical text. SUBSTANCE_USE: Mentions of illegal recreational drugs use. Include also substances that can create dependency including here caffeine and tea. “overdose, cocaine, illicit substance intoxication, coffee, etc.”.

Predicted Entities

SUBSTANCE_USE

Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

ner_pipeline = PretrainedPipeline("ner_substance_use_benchmark_pipeline", "en", "clinical/models")

text = """SOCIAL HISTORY : The patient is a nonsmoker . Denies any alcohol or illicit drug use . The patient does live with his family .
SOCIAL HISTORY : The patient smokes approximately 2 packs per day times greater than 40 years . He does drink occasional alcohol approximately 5 to 6 alcoholic drinks per month . He denies any drug use . He is a retired liquor store owner .
SOCIAL HISTORY : Patient admits alcohol use , Drinking is described as heavy , Patient denies illegal drug use , Patient denies STD history , Patient denies tobacco use .
SOCIAL HISTORY : The patient is employed in the finance department . He is a nonsmoker . He does consume alcohol on the weekend as much as 3 to 4 alcoholic beverages per day on the weekends . He denies any IV drug use or abuse .
SOCIAL HISTORY : The patient is a smoker . Admits to heroin use , alcohol abuse as well . Also admits today using cocaine .
"""

result = ner_pipeline.fullAnnotate(text)
from sparknlp.pretrained import PretrainedPipeline

ner_pipeline = nlp.PretrainedPipeline("ner_substance_use_benchmark_pipeline", "en", "clinical/models")

text = """SOCIAL HISTORY : The patient is a nonsmoker . Denies any alcohol or illicit drug use . The patient does live with his family .
SOCIAL HISTORY : The patient smokes approximately 2 packs per day times greater than 40 years . He does drink occasional alcohol approximately 5 to 6 alcoholic drinks per month . He denies any drug use . He is a retired liquor store owner .
SOCIAL HISTORY : Patient admits alcohol use , Drinking is described as heavy , Patient denies illegal drug use , Patient denies STD history , Patient denies tobacco use .
SOCIAL HISTORY : The patient is employed in the finance department . He is a nonsmoker . He does consume alcohol on the weekend as much as 3 to 4 alcoholic beverages per day on the weekends . He denies any IV drug use or abuse .
SOCIAL HISTORY : The patient is a smoker . Admits to heroin use , alcohol abuse as well . Also admits today using cocaine .
"""

result = ner_pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val ner_pipeline = PretrainedPipeline("ner_substance_use_benchmark_pipeline", "en", "clinical/models")

val text = """SOCIAL HISTORY : The patient is a nonsmoker . Denies any alcohol or illicit drug use . The patient does live with his family .
SOCIAL HISTORY : The patient smokes approximately 2 packs per day times greater than 40 years . He does drink occasional alcohol approximately 5 to 6 alcoholic drinks per month . He denies any drug use . He is a retired liquor store owner .
SOCIAL HISTORY : Patient admits alcohol use , Drinking is described as heavy , Patient denies illegal drug use , Patient denies STD history , Patient denies tobacco use .
SOCIAL HISTORY : The patient is employed in the finance department . He is a nonsmoker . He does consume alcohol on the weekend as much as 3 to 4 alcoholic beverages per day on the weekends . He denies any IV drug use or abuse .
SOCIAL HISTORY : The patient is a smoker . Admits to heroin use , alcohol abuse as well . Also admits today using cocaine .
"""

val result = ner_pipeline.fullAnnotate(text)

Results

|    | chunk            |   begin |   end | ner_label     |
|---:|:-----------------|--------:|------:|:--------------|
|  0 | illicit drug use |      68 |    83 | SUBSTANCE_USE |
|  1 | drug use         |     320 |   327 | SUBSTANCE_USE |
|  2 | illegal drug use |     462 |   477 | SUBSTANCE_USE |
|  3 | IV drug use      |     745 |   755 | SUBSTANCE_USE |
|  4 | heroin use       |     821 |   830 | SUBSTANCE_USE |
|  5 | using cocaine    |     876 |   888 | SUBSTANCE_USE |

Model Information

Model Name: ner_substance_use_benchmark_pipeline
Type: pipeline
Compatibility: Healthcare NLP 5.5.3+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetector
  • TokenizerModel
  • WordEmbeddingsModel
  • TextMatcherInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • ChunkMergeModel
  • ChunkMergeModel

Benchmarking

        label  precision    recall  f1-score   support
            O      1.000     1.000     1.000     82313
SUBSTANCE_USE      1.000     0.981     0.990       258
     accuracy      -         -         1.000     82571
    macro-avg      1.000     0.990     0.995     82571
 weighted-avg      1.000     1.000     1.000     82571