Description
Trained to add sentence classifying capabilities to distinguish between Question vs Statements in clinical domain.
This model was imported from Hugging Face (https://huggingface.co/shahrukhx01/question-vs-statement-classifier), trained based on Haystack (https://github.com/deepset-ai/haystack/issues/611) and finetuned by John Snow Labs with in-house clinical annotations.
Predicted Entities
question
, statement
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained() \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols("sentence")\
.setOutputCol("token")
seq = nlp.BertForSequenceClassification.pretrained('bert_sequence_classifier_question_statement_clinical', 'en', 'clinical/models')\
.setInputCols(["token", "sentence"])\
.setOutputCol("label")\
.setCaseSensitive(True)
pipeline = Pipeline(stages = [
documentAssembler,
sentenceDetector,
tokenizer,
seq])
test_sentences = [["""Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate?
Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright"""]]
data = spark.createDataFrame(test_sentences).toDF("text")
res = pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained()
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val seq = BertForSequenceClassification.pretrained("bert_sequence_classifier_question_statement_clinical", "en", "clinical/models")
.setInputCols(Array("token", "sentence"))
.setOutputCol("label")
.setCaseSensitive(True)
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
seq))
val test_sentences = """Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate? Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright"""
val example = Seq(test_sentences).toDS.toDF("text")
val result = pipeline.fit(example).transform(example)
import nlu
nlu.load("en.classify.bert_sequence.question_statement_clinical").predict("""Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate?
Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright""")
Results
+--------------------------------------------------------------------------------------------------------------------+---------+
|sentence |label |
+--------------------------------------------------------------------------------------------------------------------+---------+
|Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested.|statement|
|I had the tests on day 23 of my cycle. |statement|
|My progresterone level is 10. |statement|
|What does this mean? |question |
|What does progesterone level of 10 indicate? |question |
|Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle. |statement|
|So there's nothing to worry as it's perfectly alright |statement|
+--------------------------------------------------------------------------------------------------------------------+---------
Model Information
Model Name: | bert_sequence_classifier_question_statement_clinical |
Compatibility: | Healthcare NLP 3.3.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [token, sentence] |
Output Labels: | [label] |
Language: | en |
Case sensitive: | true |
Data Source
For generic domain training: https://github.com/deepset-ai/haystack/issues/611
For finetuning in clinical domain, in house JSL annotations based on clinical Q&A.
Benchmarking
label precision recall f1-score support
question 0.97 0.94 0.96 243
statement 0.98 0.99 0.99 729
accuracy - - 0.98 972
macro-avg 0.98 0.97 0.97 972
weighted-avg 0.98 0.98 0.98 972