package assertion
- Alphabetic
- Public
- All
Type Members
-
class
AssertionChunkConverter extends Transformer with DefaultParamsWritable with HasInputAnnotationCols with HasOutputAnnotationCol with HasOutputAnnotatorType with HasFeatures with CheckLicense
Creates a chunk column with metadata for training assertion status detection models.
Creates a chunk column with metadata for training assertion status detection models.
In some cases, there may be issues while creating the chunk column when using token indices that can lead to loss of data to train assertion status models. The
AssertionChunkConverter
annotator uses both begin and end indices of the tokens as input to add a more robust metadata to the chunk column in a way that improves the reliability of the indices and avoid loss of data.Notes
Chunk begin and end indices in the assertion status model training dataframe can be populated using the new version of ALAB module.
Example
Define the stages of the pipeline
val document = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val converter = new AssertionChunkConverter() .setInputCols("tokens") .setChunkTextCol("target") .setChunkBeginCol("char_begin") .setChunkEndCol("char_end") .setOutputTokenBeginCol("token_begin") .setOutputTokenEndCol("token_end") .setOutputCol("chunk")
Define the pipeline and obtain the results
val pipeline = new Pipeline().setStages(Array( document, sentenceDetector, tokenizer, converter )) results = pipeline.fit(data).transform(data)
-
class
BertAssertionClassifier extends AnnotatorModel[BertAssertionClassifier] with HasSimpleAnnotate[BertAssertionClassifier] with WhiteAndBlackListParams with HasEngine with WriteTensorflowModel with WriteOnnxModel with HasFeatures with CheckLicense
BertAssertionClassifier extracts the assertion status from text by analyzing both the extracted entities and their surrounding context.
BertAssertionClassifier extracts the assertion status from text by analyzing both the extracted entities and their surrounding context.
This classifier leverages pre-trained BERT models fine-tuned on biomedical text (e.g., BioBERT) and applies a sequence classification/regression head (a linear layer on the pooled output) to support multi-class document classification.
Key features:
- Accepts DOCUMENT and CHUNK type inputs and produces ASSERTION type annotations.
- Emphasizes entity context by marking target entities with special tokens (e.g., [entity]), allowing the model to better focus on them.
- Utilizes a transformer-based architecture (BERT for Sequence Classification) to achieve accurate assertion status prediction.
Input Example:
This annotator preprocesses the input text to emphasize the target entities as follows: [CLS] Patient with [entity] severe fever [entity].
Pretrained models can be loaded with
pretrained
of the companion object:val assertion = BertAssertionClassifier.pretrained() .setInputCols("sentence", "chunk") .setOutputCol("assertion")
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val wordEmbeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") .setInputCols(Array("sentence", "token")) .setOutputCol("embeddings") val clinicalNer = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models") .setInputCols(Array("sentence", "token", "embeddings")) .setOutputCol("ner") val nerConverter = new NerConverterInternal() .setInputCols(Array("sentence", "token", "ner")) .setOutputCol("ner_chunk") val assertion = BertAssertionClassifier.pretrained() .setInputCols("sentence", "ner_chunk") .setOutputCol("assertion") val pipeline = new Pipeline().setStages(Array( documentAssembler, sentenceDetector, tokenizer, wordEmbeddings, clinicalNer, nerConverter, assertion )) val text ="""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural |and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor |located at the right lower lobe. Father with Alzheimer.""".stripMargin val data = Seq(text).toDF("text") val result = pipeline.fit(data).transform(data) result.selectExpr("explode(assertion) as assertion").show(false)
Results:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |assertion | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |{assertion, 13, 24, present, {assertion_source -> assertion, chunk -> 0, ner_chunk -> severe fever, confidence -> 0.9996883, ner_label -> PROBLEM, sentence -> 0}, []} | |{assertion, 30, 40, present, {assertion_source -> assertion, chunk -> 1, ner_chunk -> sore throat, confidence -> 0.999676, ner_label -> PROBLEM, sentence -> 0}, []} | |{assertion, 55, 66, absent, {assertion_source -> assertion, chunk -> 2, ner_chunk -> stomach pain, confidence -> 0.9989444, ner_label -> PROBLEM, sentence -> 1}, []} | |{assertion, 89, 99, present, {assertion_source -> assertion, chunk -> 3, ner_chunk -> an epidural, confidence -> 0.99903834, ner_label -> TREATMENT, sentence -> 1}, []} | |{assertion, 106, 108, present, {assertion_source -> assertion, chunk -> 4, ner_chunk -> PCA, confidence -> 0.99900436, ner_label -> TREATMENT, sentence -> 1}, []} | |{assertion, 114, 125, present, {assertion_source -> assertion, chunk -> 5, ner_chunk -> pain control, confidence -> 0.9993321, ner_label -> PROBLEM, sentence -> 1}, []} | |{assertion, 143, 157, present, {assertion_source -> assertion, chunk -> 6, ner_chunk -> short of breath, confidence -> 0.9997882, ner_label -> PROBLEM, sentence -> 2}, []}| |{assertion, 199, 200, present, {assertion_source -> assertion, chunk -> 7, ner_chunk -> CT, confidence -> 0.9996158, ner_label -> TEST, sentence -> 3}, []} | |{assertion, 203, 212, present, {assertion_source -> assertion, chunk -> 8, ner_chunk -> lung tumor, confidence -> 0.9997308, ner_label -> PROBLEM, sentence -> 3}, []} | |{assertion, 260, 268, present, {assertion_source -> assertion, chunk -> 9, ner_chunk -> Alzheimer, confidence -> 0.98367596, ner_label -> PROBLEM, sentence -> 4}, []} | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
- See also
com.johnsnowlabs.nlp.annotators.assertion.dl.AssertionDLModel
MedicalBertForSequenceClassification
Annotators Main Page for a list of transformer based classifiers and assertion annotators
-
case class
Datapoint(sentence: String, target: String, label: String, start: Int, end: Int) extends Product with Serializable
Created by jose on 19/03/18.
- class FewShotAssertionSentenceConverter extends AnnotatorModel[FewShotAssertionSentenceConverter] with HasSimpleAnnotate[FewShotAssertionSentenceConverter] with CheckLicense
- trait ReadBertAssertionClassifier extends ReadTensorflowModel with ReadOnnxModel
- trait ReadablePretrainedBertAssertionClassifier extends ParamsAndFeaturesReadable[BertAssertionClassifier] with HasPretrained[BertAssertionClassifier]
Value Members
-
object
AssertionChunkConverter extends ParamsAndFeaturesReadable[AssertionChunkConverter] with Serializable
This is the companion object of AssertionChunkConverter.
This is the companion object of AssertionChunkConverter. Please refer to that class for the documentation.
- object BertAssertionClassifier extends ReadablePretrainedBertAssertionClassifier with ReadBertAssertionClassifier with Serializable
- object FewShotAssertionSentenceConverter extends ParamsAndFeaturesReadable[FewShotAssertionSentenceConverter] with Serializable