Description
This pipeline, detects PHI (Protected Health Information) entities for deidentification purposes. It is a subentity pipeline capable of detecting various PHI entities with granular labels such as PATIENT, DOCTOR, HOSPITAL, STREET, CITY, ZIP, etc.
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("ner_deid_subentity_nonMedical_pipeline", "en", "clinical/models")
sample_text = """
Emily Davis, a 34-year-old Female, Dr. Michael Johnson cares with her at CarePlus Clinic, located at 456 Elm Street, NewYork, NY 10001, USA.
She can be reached at 555-642-1725 or via email emily.davis@gmail.com. Her SSN is 725-46-2729.
She works as a Nurse at City General Hospital. Her account number is 8003591.
She has an appointment scheduled for March 15, 2024 at 10:30 AM.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
from johnsnowlabs import nlp, medical
pipeline = nlp.PretrainedPipeline("ner_deid_subentity_nonMedical_pipeline", "en", "clinical/models")
sample_text = """
Emily Davis, a 34-year-old Female, Dr. Michael Johnson cares with her at CarePlus Clinic, located at 456 Elm Street, NewYork, NY 10001, USA.
She can be reached at 555-642-1725 or via email emily.davis@gmail.com. Her SSN is 725-46-2729.
She works as a Nurse at City General Hospital. Her account number is 8003591.
She has an appointment scheduled for March 15, 2024 at 10:30 AM.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val pipeline = PretrainedPipeline("ner_deid_subentity_nonMedical_pipeline", "en", "clinical/models")
val sample_text = """
Emily Davis, a 34-year-old Female, Dr. Michael Johnson cares with her at CarePlus Clinic, located at 456 Elm Street, NewYork, NY 10001, USA.
She can be reached at 555-642-1725 or via email emily.davis@gmail.com. Her SSN is 725-46-2729.
She works as a Nurse at City General Hospital. Her account number is 8003591.
She has an appointment scheduled for March 15, 2024 at 10:30 AM.
"""
val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
Results
| chunk | begin | end | ner_label |
| :---------------------------------------- | ----: | --: | :-------- |
| Emily Davis | 1 | 11 | NAME |
| 34-year-old | 16 | 26 | AGE |
| Female | 28 | 33 | GENDER |
| Michael Johnson | 40 | 54 | DOCTOR |
| CarePlus Clinic | 74 | 88 | HOSPITAL |
| 456 Elm Street | 102 | 115 | STREET |
| NewYork | 118 | 124 | CITY |
| NY | 127 | 128 | STATE |
| 10001 | 130 | 134 | ZIP |
| USA | 137 | 139 | COUNTRY |
| 555-642-1725 | 165 | 176 | PHONE |
| [davis@gmail.com](mailto:davis@gmail.com) | 197 | 211 | EMAIL |
| 725-46-2729 | 225 | 235 | SSN |
| City General Hospital | 263 | 283 | HOSPITAL |
| March 15, 2024 | 355 | 368 | DATE |
| 10:30 AM | 373 | 380 | TIME |
Model Information
| Model Name: | ner_deid_subentity_nonMedical_pipeline |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 6.3.0+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel