Description
This pipeline, detects PHI (Protected Health Information) entities for deidentification purposes. It is a generic pipeline capable of detecting various PHI entities such as DATE, NAME, LOCATION, ID, CONTACT, AGE, PROFESSION, etc.
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")
sample_text = """
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
from johnsnowlabs import nlp, medical
pipeline = nlp.PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")
sample_text = """
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val pipeline = PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")
val sample_text = """
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""
val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
Results
| chunk | begin | end | ner_label |
| :---------------------- | ----: | --: | :-------- |
| James Wilson | 5 | 16 | NAME |
| 65-year-old | 23 | 33 | AGE |
| Boston General Hospital | 85 | 107 | LOCATION |
| 10/25/2023 | 112 | 121 | DATE |
| 123 Oak Street | 137 | 150 | LOCATION |
| Springfield | 153 | 163 | LOCATION |
| IL | 166 | 167 | LOCATION |
| 555-0199 | 199 | 206 | CONTACT |
| 999-00-1234 | 221 | 231 | ID |
| Gregory House | 238 | 250 | NAME |
Model Information
| Model Name: | ner_deid_generic_nonMedical_pipeline |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 6.3.0+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 1.7 GB |
Included Models
- DocumentAssembler
- SentenceDetectorDLModel
- TokenizerModel
- WordEmbeddingsModel
- MedicalNerModel
- NerConverterInternalModel