NER Pipeline Benchmark Large (Document Wise)

Description

This pipeline can be used to detect PHI entities in medical texts using Named Entity Recognition (NER). It identifies various types of sensitive entities such as: ‘MEDICALRECORD’, ‘LOCATION’, ‘PROFESSION’, ‘DOCTOR’, ‘USERNAME’, ‘CITY’, ‘ZIP’, ‘STATE’, ‘PATIENT’, ‘COUNTRY’, ‘STREET’, ‘HOSPITAL’, ‘DLN’, ‘IDNUM’, ‘AGE’, ‘DATE’, ‘PHONE’, ‘EMAIL’, ‘ORGANIZATION’, ‘SSN’, ‘ACCOUNT’, ‘PLATE’, ‘VIN’, ‘LICENSE’, ‘URL’, ‘IP’

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

ner_docwise = PretrainedPipeline("pp_docwise_benchmark_medium_preann", "en", "clinical/models")

text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

result = ner_docwise.fullAnnotate(text)



from sparknlp.pretrained import PretrainedPipeline

ner_docwise = nlp.PretrainedPipeline("pp_docwise_benchmark_medium_preann", "en", "clinical/models")

text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

result = ner_docwise.fullAnnotate(text)


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val ner_docwise = PretrainedPipeline("pp_docwise_benchmark_medium_preann", "en", "clinical/models")

val text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""

val result = ner_docwise.fullAnnotate(text)

Results


+--------------------+-----+---+---------+
|chunk               |begin|end|ner_label|
+--------------------+-----+---+---------+
|John Lee            |4    |11 |DOCTOR   |
|Royal Medical Clinic|19   |38 |HOSPITAL |
|Chicago             |43   |49 |CITY     |
|11/05/2024          |79   |88 |DATE     |
|56467890            |130  |137|IDNUM    |
|Emma Wilson         |153  |163|PATIENT  |
|50 years old        |169  |180|AGE      |
|444-456-7890        |203  |214|PHONE    |
+--------------------+-----+---+---------+

Model Information

Model Name: pp_docwise_benchmark_medium_preann
Type: pipeline
Compatibility: Healthcare NLP 6.0.4+
License: Licensed
Edition: Official
Language: en
Size: 2.5 GB

Included Models

  • DocumentAssembler
  • InternalDocumentSplitter
  • TokenizerModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalNerModel
  • NerConverterInternalModel
  • PretrainedZeroShotNER
  • NerConverterInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • ChunkMergeModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • ContextualParserModel
  • ContextualParserModel
  • RegexMatcherInternalModel
  • RegexMatcherInternalModel
  • RegexMatcherInternalModel
  • ContextualParserModel
  • TextMatcherInternalModel
  • TextMatcherInternalModel
  • ContextualParserModel
  • ContextualParserModel
  • ChunkMergeModel
  • ChunkMergeModel