Description
This is a pretrained Named Entity Recognition (NER) deep learning model for clinical terminology.
It is based on the bert_token_classifier_ner_jsl model, but with more generalized entities.
Predicted Entities and Definitions:
- Death_Entity — Mentions that indicate the death of a patient.
 - Medical_Device — Mentions related to medical devices and supplies.
 - Vital_Signs_Header — Section headers corresponding to vital signs of a patient.
 - Allergen — Allergen-related mentions.
 - Drug_BrandName — Commercial name chosen by the labeler or manufacturer for a drug containing one or more active ingredients.
 - Clinical_Dept — Mentions of medical and/or surgical departments.
 - Symptom — Mentions of symptoms, either of the patient or someone else.
 - External_body_part_or_region — Mentions of external body parts or organs visible to the naked eye.
 - Admission_Discharge — Mentions indicating patient admission and/or discharge.
 - Age — Mentions of age (past or present, patient or others).
 - Birth_Entity — Mentions of childbirth.
 - Oncological — Mentions of cancer, tumors, or metastasis (patient or others).
 - Substance_Quantity — Quantitative mentions of illicit or recreational drug use.
 - Test_Result — Mentions of clinical test results.
 - Test — Mentions of laboratory, pathology, and radiological tests.
 - Procedure — Mentions of invasive medical or surgical procedures/treatments.
 - Treatment — Mentions of therapeutic or minimally invasive treatments (excluding invasive “Procedure”).
 - Disease_Syndrome_Disorder — Mentions of diseases, syndromes, or disorders (excluding those with specific labels such as Heart_Disease).
 
Predicted Entities
B-Birth_Entity, I-Vital_Sign, I-Test_Result, B-Death_Entity, I-Header, B-Vital_Sign, B-Disease_Syndrome_Disorder, B-Substance_Quantity, I-Pregnancy_Newborn, I-Clinical_Dept, I-Body_Part, B-Demographics, I-Admission_Discharge, I-Oncological, B-Test, I-Death_Entity, I-Date_Time, B-Oncological, I-Lifestyle, B-Drug, O, I-Demographics, I-Disease_Syndrome_Disorder, B-Medical_Device, B-Symptom, B-Clinical_Dept, B-Body_Part, B-Header, I-Medical_Device, I-Symptom, B-Lifestyle, B-Physical_Measurement, B-Procedure, B-Treatment, B-Age, I-Drug, I-Substance_Quantity, I-Treatment, B-Admission_Discharge, I-Physical_Measurement, B-Alergen, B-Date_Time, B-Test_Result, I-Age, I-Test, I-Procedure, B-Pregnancy_Newborn, I-Alergen, PAD
How to use
from sparknlp.base import DocumentAssembler
from sparknlp_jsl.annotator import SentenceDetectorDLModel, MedicalBertForTokenClassifier
from sparknlp.annotator import Tokenizer, NerConverter
from pyspark.ml import Pipeline
document_assembler = (
    DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
)
sentenceDetector = (
    SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")
    .setInputCols(["document"])
    .setOutputCol("sentence")
)
tokenizer = (
    Tokenizer()
    .setInputCols(["sentence"])
    .setOutputCol("token")
)
token_classifier = (
    MedicalBertForTokenClassifier.pretrained(
        "bert_token_classifier_ner_jsl_slim_onnx",
        "en",
        "clinical/models"
    )
    .setInputCols(["token", "sentence"])
    .setOutputCol("ner")
    .setCaseSensitive(True)
)
ner_converter = (
     NerConverterInternal()
    .setInputCols(["sentence", "token", "ner"])
    .setOutputCol("ner_chunk")
)
pipeline = Pipeline(stages=[
    document_assembler,
    sentenceDetector,
    tokenizer,
    token_classifier,
    ner_converter
])
test_sentence = "HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."
data = spark.createDataFrame([[test_sentence]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
from johnsnowlabs import nlp, medical
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")
token_classifier = medical.BertForTokenClassifier.pretrained(
        "bert_token_classifier_ner_jsl_slim_onnx",
        "en",
        "clinical/models"
    )\
    .setInputCols(["token", "sentence"])\
    .setOutputCol("ner")\
    .setCaseSensitive(True)
ner_converter = medical.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentenceDetector,
    tokenizer,
    token_classifier,
    ner_converter
])
test_sentence = "HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."
data = spark.createDataFrame([[test_sentence]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotators.Tokenizer
import com.johnsnowlabs.nlp.annotators.ner.NerConverter
import com.johnsnowlabs.nlp.annotators.classifier.dl.MedicalBertForTokenClassifier
import com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLApproach
import org.apache.spark.ml.Pipeline
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetectorDLModel()
  .pretrained("sentence_detector_dl","xx")
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")
val tokenClassifier = MedicalBertForTokenClassifier
  .pretrained("bert_token_classifier_ner_jsl_slim_onnx", "en", "clinical/models")
  .setInputCols(Array("token", "document"))
  .setOutputCol("ner")
  .setCaseSensitive(true)
val nerConverter = new  NerConverterInternal()
  .setInputCols(Array("document", "token", "ner"))
  .setOutputCol("ner_chunk")
val pipeline = new Pipeline()
  .setStages(Array(
    documentAssembler,
    sentenceDetector,
    tokenizer,
    tokenClassifier,
    nerConverter
  ))
val testSentence = "HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."
val data = Seq(testSentence).toDF("text")
val model = pipeline.fit(data)
val result = model.transform(data)
Results
+----------------+------------+
|text            |entity      |
+----------------+------------+
|HISTORY:        |Header      |
|30-year-old     |Age         |
|female          |Demographics|
|mammography     |Test        |
|soft tissue lump|Symptom     |
|shoulder        |Body_Part   |
|breast cancer   |Oncological |
|her mother      |Demographics|
|age 58          |Age         |
|breast cancer   |Oncological |
+----------------+------------+
Model Information
| Model Name: | bert_token_classifier_ner_jsl_slim_onnx | 
| Compatibility: | Healthcare NLP 6.1.1+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [document, token] | 
| Output Labels: | [ner] | 
| Language: | en | 
| Size: | 403.8 MB | 
| Case sensitive: | true | 
| Max sentence length: | 128 |