assertion

package assertion

Ordering

Alphabetic

Visibility

Public
All

Type Members

class AssertionChunkConverter extends Transformer with DefaultParamsWritable with HasInputAnnotationCols with HasOutputAnnotationCol with HasOutputAnnotatorType with HasFeatures with CheckLicense
Creates a chunk column with metadata for training assertion status detection models.
Creates a chunk column with metadata for training assertion status detection models.
In some cases, there may be issues while creating the chunk column when using token indices that can lead to loss of data to train assertion status models. The AssertionChunkConverter annotator uses both begin and end indices of the tokens as input to add a more robust metadata to the chunk column in a way that improves the reliability of the indices and avoid loss of data.
Notes
Chunk begin and end indices in the assertion status model training dataframe can be populated using the new version of ALAB module.
Example
Define the stages of the pipeline
```
val document = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")
val converter  = new AssertionChunkConverter()
  .setInputCols("tokens")
  .setChunkTextCol("target")
  .setChunkBeginCol("char_begin")
  .setChunkEndCol("char_end")
  .setOutputTokenBeginCol("token_begin")
  .setOutputTokenEndCol("token_end")
  .setOutputCol("chunk")
```
Define the pipeline and obtain the results
```
val pipeline = new Pipeline().setStages(Array(
  document,
  sentenceDetector,
  tokenizer,
  converter
))

results = pipeline.fit(data).transform(data)
```

class BertForAssertionClassification extends AnnotatorModel[BertForAssertionClassification] with HasBatchedAnnotate[BertForAssertionClassification] with WhiteAndBlackListParams with HasEngine with WriteTensorflowModel with WriteOnnxModel with HasFeatures with CheckLicense

BertForAssertionClassification extracts the assertion status from text by analyzing both the extracted entities and their surrounding context.

This classifier leverages pre-trained BERT models fine-tuned on biomedical text (e.g., BioBERT) and applies a sequence classification/regression head (a linear layer on the pooled output) to support multi-class document classification.

Key features:

Accepts DOCUMENT and CHUNK type inputs and produces ASSERTION type annotations.
Emphasizes entity context by marking target entities with special tokens (e.g., [entity]), allowing the model to better focus on them.
Utilizes a transformer-based architecture (BERT for Sequence Classification) to achieve accurate assertion status prediction.

Input Example:

 This annotator preprocesses the input text to emphasize the
target entities as follows: [CLS] Patient with [entity] severe fever [entity].

Pretrained models can be loaded with pretrained of the companion object:

val assertion = BertForAssertionClassification.pretrained()
  .setInputCols("sentence", "chunk")
  .setOutputCol("assertion")

Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

Example

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val wordEmbeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")

val clinicalNer = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")

val nerConverter = new NerConverterInternal()
  .setInputCols(Array("sentence", "token", "ner"))
  .setOutputCol("ner_chunk")

val assertion = BertForAssertionClassification.pretrained()
  .setInputCols("sentence", "ner_chunk")
  .setOutputCol("assertion")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler, sentenceDetector, tokenizer, wordEmbeddings, clinicalNer, nerConverter, assertion
))
val text ="""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural
|and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor
|located at the right lower lobe. Father with Alzheimer.""".stripMargin

val data = Seq(text).toDF("text")
val result = pipeline.fit(data).transform(data)
result.selectExpr("explode(assertion) as assertion").show(false)

Results:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|assertion                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{assertion, 13, 24, present, {assertion_source -> assertion, chunk -> 0, ner_chunk -> severe fever, confidence -> 0.9996883, ner_label -> PROBLEM, sentence -> 0}, []}     |
|{assertion, 30, 40, present, {assertion_source -> assertion, chunk -> 1, ner_chunk -> sore throat, confidence -> 0.999676, ner_label -> PROBLEM, sentence -> 0}, []}       |
|{assertion, 55, 66, absent, {assertion_source -> assertion, chunk -> 2, ner_chunk -> stomach pain, confidence -> 0.9989444, ner_label -> PROBLEM, sentence -> 1}, []}      |
|{assertion, 89, 99, present, {assertion_source -> assertion, chunk -> 3, ner_chunk -> an epidural, confidence -> 0.99903834, ner_label -> TREATMENT, sentence -> 1}, []}   |
|{assertion, 106, 108, present, {assertion_source -> assertion, chunk -> 4, ner_chunk -> PCA, confidence -> 0.99900436, ner_label -> TREATMENT, sentence -> 1}, []}         |
|{assertion, 114, 125, present, {assertion_source -> assertion, chunk -> 5, ner_chunk -> pain control, confidence -> 0.9993321, ner_label -> PROBLEM, sentence -> 1}, []}   |
|{assertion, 143, 157, present, {assertion_source -> assertion, chunk -> 6, ner_chunk -> short of breath, confidence -> 0.9997882, ner_label -> PROBLEM, sentence -> 2}, []}|
|{assertion, 199, 200, present, {assertion_source -> assertion, chunk -> 7, ner_chunk -> CT, confidence -> 0.9996158, ner_label -> TEST, sentence -> 3}, []}                |
|{assertion, 203, 212, present, {assertion_source -> assertion, chunk -> 8, ner_chunk -> lung tumor, confidence -> 0.9997308, ner_label -> PROBLEM, sentence -> 3}, []}     |
|{assertion, 260, 268, present, {assertion_source -> assertion, chunk -> 9, ner_chunk -> Alzheimer, confidence -> 0.98367596, ner_label -> PROBLEM, sentence -> 4}, []}     |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

See also: BertForAssertionClassification
com.johnsnowlabs.nlp.annotators.assertion.dl.AssertionDLModel
MedicalBertForSequenceClassification
Annotators Main Page for a list of transformer based classifiers and assertion annotators

case class Datapoint(sentence: String, target: String, label: String, start: Int, end: Int) extends Product with Serializable
Created by jose on 19/03/18.
class FewShotAssertionSentenceConverter extends AnnotatorModel[FewShotAssertionSentenceConverter] with HasSimpleAnnotate[FewShotAssertionSentenceConverter] with CheckLicense
trait ReadBertAssertionClassifier extends ReadTensorflowModel with ReadOnnxModel
trait ReadablePretrainedBertAssertionClassifier extends ParamsAndFeaturesReadable[BertForAssertionClassification] with HasPretrained[BertForAssertionClassification]

Value Members

object AssertionChunkConverter extends ParamsAndFeaturesReadable[AssertionChunkConverter] with Serializable
This is the companion object of AssertionChunkConverter.
This is the companion object of AssertionChunkConverter. Please refer to that class for the documentation.
object BertForAssertionClassification extends ReadablePretrainedBertAssertionClassifier with ReadBertAssertionClassifier with Serializable
object FewShotAssertionSentenceConverter extends ParamsAndFeaturesReadable[FewShotAssertionSentenceConverter] with Serializable

Packages

assertion

package assertion

Type Members

Notes

Example

Example

Value Members

Ungrouped

Packages

assertion 

package assertion

Type Members

Notes

Example

Example

Value Members

Ungrouped

assertion