Description
This model is a BioBERT based sentence classification model that can determine whether the clinical sentences include terms related to biomarkers or not.
Predicted Entities
1
: Contains biomarker related terms
0
: Doesn’t contain biomarker related terms
How to use
document_assembler = DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = Tokenizer() \
.setInputCols(['sentence']) \
.setOutputCol('token')
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_biomarker","en","clinical/models")\
.setInputCols(["sentence",'token'])\
.setOutputCol("prediction")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
sequenceClassifier
])
data = spark.createDataFrame([["""In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease. Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies."""]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_biomarker","en","clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("prediction")
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
sequenceClassifier
))
val data = spark.createDataFrame(Seq(
("""In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease. Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies.""",)
)).toDF("text")
val model = pipeline.fit(data)
val result = model.transform(data)
Results
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|sentence |prediction|
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. |0 |
|For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. |1 |
|Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. |1 |
|Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease.|1 |
|Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. |0 |
|In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. |0 |
|Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies.|0 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
Model Information
Model Name: | bert_sequence_classifier_biomarker |
Compatibility: | Healthcare NLP 5.2.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [prediction] |
Language: | en |
Size: | 406.4 MB |
Case sensitive: | false |
Max sentence length: | 512 |