Description
This pipeline can be used to detect PHI entities in medical texts using Named Entity Recognition (NER). It identifies various types of sensitive entities such as: ‘CONTACT’, ‘DATE’, ‘ID’, ‘LOCATION’, ‘PROFESSION’, ‘DOCTOR’, ‘EMAIL’, ‘PATIENT’, ‘URL’, ‘USERNAME’, ‘CITY’, ‘COUNTRY’, ‘DLN’, ‘HOSPITAL’, ‘IDNUM’, ‘LOCATION_OTHER’, ‘MEDICALRECORD’, ‘STATE’, ‘STREET’, ‘ZIP’, ‘AGE’, ‘PHONE’, ‘ORGANIZATION’, ‘SSN’, ‘ACCOUNT’, ‘PLATE’, ‘VIN’, ‘LICENSE’, and ‘IP’.
How to use
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("ner_docwise_benchmark_medium", "en", "clinical/models")
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
deid_result = deid_pipeline.fullAnnotate(text)
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = nlp.PretrainedPipeline("ner_docwise_benchmark_medium", "en", "clinical/models")
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
deid_result = deid_pipeline.fullAnnotate(text)
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val deid_pipeline = PretrainedPipeline("ner_docwise_benchmark_medium", "en", "clinical/models")
val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
val deid_result = deid_pipeline.fullAnnotate(text)
Results
| | text | result |
|---:|:-------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------|
| 0 | Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. | ['John Lee', 'Royal Medical Clinic', 'Chicago', '11/05/2024', '56467890', 'Emma Wilson', '50 years old', '444-456-7890'] |
| | The patient’s medical record number is 56467890. | |
| | The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 . | |
Model Information
Model Name: | ner_docwise_benchmark_medium |
Type: | pipeline |
Compatibility: | Healthcare NLP 6.0.4+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 2.5 GB |
Included Models
- DocumentAssembler
- InternalDocumentSplitter
- TokenizerModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel
- PretrainedZeroShotNER
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- MedicalNerModel
- NerConverterInternalModel
- ChunkMergeModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- RegexMatcherInternalModel
- ContextualParserModel
- TextMatcherInternalModel
- TextMatcherInternalModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel
- ChunkMergeModel