Description
This pipeline can be used to detect PHI entities in medical texts using Named Entity Recognition (NER). It identifies various types of sensitive entities such as: ‘MEDICALRECORD’, ‘LOCATION’, ‘PROFESSION’, ‘DOCTOR’, ‘USERNAME’, ‘CITY’, ‘ZIP’, ‘STATE’, ‘PATIENT’, ‘COUNTRY’, ‘STREET’, ‘HOSPITAL’, ‘DLN’, ‘IDNUM’, ‘AGE’, ‘DATE’, ‘PHONE’, ‘EMAIL’, ‘ORGANIZATION’, ‘SSN’, ‘ACCOUNT’, ‘PLATE’, ‘VIN’, ‘LICENSE’, ‘URL’, ‘IP’
How to use
from sparknlp.pretrained import PretrainedPipeline
ner_docwise = PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
result = ner_docwise.fullAnnotate(text)
from sparknlp.pretrained import PretrainedPipeline
ner_docwise = nlp.PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
result = ner_docwise.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val ner_docwise = PretrainedPipeline("pp_docwise_benchmark_large_preann", "en", "clinical/models")
val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
val result = ner_docwise.fullAnnotate(text)
Results
+--------------------+-----+---+---------+
|chunk |begin|end|ner_label|
+--------------------+-----+---+---------+
|John Lee |4 |11 |DOCTOR |
|Royal Medical Clinic|19 |38 |HOSPITAL |
|Chicago |43 |49 |CITY |
|11/05/2024 |79 |88 |DATE |
|56467890 |130 |137|IDNUM |
|Emma Wilson |153 |163|PATIENT |
|50 years old |169 |180|AGE |
|444-456-7890 |203 |214|PHONE |
+--------------------+-----+---+---------+
Model Information
| Model Name: | pp_docwise_benchmark_large_preann |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 6.0.4+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 3.4 GB |
Included Models
- DocumentAssembler
- InternalDocumentSplitter
- TokenizerModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- PretrainedZeroShotNER
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- ContextualParserModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel