Description
This model is a BioBERT based sentence classification model that can determine whether the clinical sentences include terms related to biomarkers or not.
Predicted Entities
1: Contains biomarker related terms
, 0: Doesn't contain biomarker related terms
How to use
document_assembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
tokenizer = Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")
sequence_classifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_biomarker_onnx", "en", "clinical/models")\
.setInputCols(["document", "token"])\
.setOutputCol("class")
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
sequence_classifier
])
data = spark.createDataFrame([["""In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease. Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies."""]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
tokenizer = nlp.Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")
sequenceClassifier = medical.BertForSequenceClassification.pretrained("bert_sequence_classifier_biomarker_onnx", "en", "clinical/models")\
.setInputCols(["document","token"])\
.setOutputCol("classes")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
data = spark.createDataFrame([["""In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease. Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies."""]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_biomarker_onnx", "en", "clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier))
val data = Seq("""In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease. Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies.""").toDF("text")
val model = pipeline.fit(data)
val result = model.transform(data)
Results
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|sentence |prediction|
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|In the realm of cancer research, several biomarkers have emerged as crucial indicators of disease progression and treatment response. |0 |
|For instance, the expression levels of HER2/neu, a protein receptor, have been linked to aggressive forms of breast cancer. |1 |
|Additionally, the presence of prostate-specific antigen (PSA) is often monitored to track the progression of prostate cancer. |1 |
|Moreover, in cardiovascular health, high-sensitivity C-reactive protein (hs-CRP) serves as a biomarker for inflammation and potential risk of heart disease.|1 |
|Meanwhile, elevated levels of troponin T are indicative of myocardial damage, commonly observed in acute coronary syndrome. |0 |
|In the field of diabetes management, glycated hemoglobin is a widely used to assess long-term blood sugar control. |0 |
|Its levels reflect the average blood glucose concentration over the past two to three months, offering valuable insights into disease management strategies.|0 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
Model Information
Model Name: | bert_sequence_classifier_biomarker_onnx |
Compatibility: | Healthcare NLP 6.1.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [label] |
Language: | en |
Size: | 437.7 MB |
Case sensitive: | true |