NER Pipeline Benchmark Large (Document Wise)

Description

This pipeline can be used to detect PHI entities in medical texts using Named Entity Recognition (NER). It identifies various types of sensitive entities such as: ‘MEDICALRECORD’, ‘LOCATION’, ‘PROFESSION’, ‘DOCTOR’, ‘USERNAME’, ‘CITY’, ‘ZIP’, ‘STATE’, ‘PATIENT’, ‘COUNTRY’, ‘STREET’, ‘HOSPITAL’, ‘DLN’, ‘IDNUM’, ‘AGE’, ‘DATE’, ‘PHONE’, ‘EMAIL’, ‘ORGANIZATION’, ‘SSN’, ‘ACCOUNT’, ‘PLATE’, ‘VIN’, ‘LICENSE’, ‘URL’, ‘IP’

Download Copy S3 URI

How to use

from sparknlp.pretrained import PretrainedPipeline

ner_docwise = PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")

text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

result = ner_docwise.fullAnnotate(text)

from sparknlp.pretrained import PretrainedPipeline

ner_docwise = nlp.PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")

text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

result = ner_docwise.fullAnnotate(text)

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val ner_docwise = PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")

val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

val result = ner_docwise.fullAnnotate(text)

Results

+--------------------+-----+---+---------+
|chunk               |begin|end|ner_label|
+--------------------+-----+---+---------+
|John Lee            |4    |11 |DOCTOR   |
|Royal Medical Clinic|19   |38 |HOSPITAL |
|Chicago             |43   |49 |CITY     |
|11/05/2024          |79   |88 |DATE     |
|56467890            |130  |137|IDNUM    |
|Emma Wilson         |153  |163|PATIENT  |
|50 years old        |169  |180|AGE      |
|444-456-7890        |203  |214|PHONE    |
+--------------------+-----+---+---------+

Model Information

Model Name:	pp_docwise_benchmark_large_preann
Type:	pipeline
Compatibility:	Healthcare NLP 6.0.4+
License:	Licensed
Edition:	Official
Language:	en
Size:	3.4 GB

Included Models

DocumentAssembler
InternalDocumentSplitter
TokenizerModel
TokenizerModel
WordEmbeddingsModel
MedicalNerModel
NerConverterInternalModel
PretrainedZeroShotNER
NerConverterInternalModel
MedicalNerModel
NerConverterInternalModel
ChunkMergeModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
ContextualParserModel
RegexMatcherInternalModel
ContextualParserModel
ContextualParserModel
RegexMatcherInternalModel
RegexMatcherInternalModel
RegexMatcherInternalModel
ContextualParserModel
TextMatcherInternalModel
TextMatcherInternalModel
ContextualParserModel
ContextualParserModel
ChunkMergeModel
ChunkMergeModel

PREVIOUSCity Text Matcher

NEXTNER Pipeline Benchmark Large (Document Wise)