Detect PHI for Deidentification(Generic) Pipeline

Description

This pipeline, detects PHI (Protected Health Information) entities for deidentification purposes. It is a generic pipeline capable of detecting various PHI entities such as DATE, NAME, LOCATION, ID, CONTACT, AGE, PROFESSION, etc.

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")

sample_text = """ 
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


from johnsnowlabs import nlp, medical

pipeline = nlp.PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")

sample_text = """ 
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = PretrainedPipeline("ner_deid_generic_nonMedical_pipeline", "en", "clinical/models")

val sample_text = """ 
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""

val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

Results


| chunk                   | begin | end | ner_label |
| :---------------------- | ----: | --: | :-------- |
| James Wilson            |     5 |  16 | NAME      |
| 65-year-old             |    23 |  33 | AGE       |
| Boston General Hospital |    85 | 107 | LOCATION  |
| 10/25/2023              |   112 | 121 | DATE      |
| 123 Oak Street          |   137 | 150 | LOCATION  |
| Springfield             |   153 | 163 | LOCATION  |
| IL                      |   166 | 167 | LOCATION  |
| 555-0199                |   199 | 206 | CONTACT   |
| 999-00-1234             |   221 | 231 | ID        |
| Gregory House           |   238 | 250 | NAME      |

Model Information

Model Name: ner_deid_generic_nonMedical_pipeline
Type: pipeline
Compatibility: Healthcare NLP 6.3.0+
License: Licensed
Edition: Official
Language: en
Size: 1.7 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverterInternalModel