sparknlp_jsl.annotator.assertion.contextual_assertion#

Module Contents#

Classes#

ContextualAssertion

An annotator model for contextual assertion analysis.

class ContextualAssertion(classname='com.johnsnowlabs.nlp.annotators.assertion.context.ContextualAssertion', java_model=None)#

Bases: sparknlp_jsl.common.AnnotatorModelInternal, sparknlp_jsl.annotator.handle_exception_params.HandleExceptionParams

An annotator model for contextual assertion analysis.

This model identifies contextual cues within text data, such as negation, uncertainty, and assertion. It is used clinical assertion detection, etc. It annotates text chunks with assertions based on configurable rules, prefix and suffix patterns, and exception patterns.

Input Annotation types

Output Annotation type

DOCUMENT, TOKEN, CHUNK

ASSERTION

Parameters:
  • caseSensitive – Whether to use case sensitive when matching values

  • prefixAndSuffixMatch – Whether to match both prefix and suffix to annotate the hit

  • prefixKeywords – Prefix keywords to match

  • suffixKeywords – Suffix keywords to match

  • exceptionKeywords – Exception keywords not to match

  • prefixRegexPatterns – Prefix regex patterns to match

  • suffixRegexPatterns – Suffix regex pattern to match

  • exceptionRegexPatterns – Exception regex pattern not to match

  • scopeWindow – The scope window of the assertion expression

  • assertion – Assertion to match

  • includeChunkToScope – Whether to include chunk to scope when matching values

  • scopeWindowDelimiters – Delimiters used to limit the scope window.

  • confidenceCalculationDirection – Direction of confidence calculation. Accepted values are “left”, “right”, “both”. Default is “left”

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> import sparknlp_jsl
>>> from sparknlp_jsl.annotator import *
>>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler()     ...   .setInputCol("text")     ...   .setOutputCol("document")
...
>>> sentenceDetector = SentenceDetector()     ...   .setInputCols(["document"])     ...   .setOutputCol("sentence")
...
>>> tokenizer = Tokenizer()     ...   .setInputCols(["sentence"])     ...   .setOutputCol("token")
>>> word_embeddings = WordEmbeddingsModel     ...        .pretrained("embeddings_clinical", "en", "clinical/models")     ...        .setInputCols(["sentence", "token"])     ...        .setOutputCol("embeddings")
>>>    clinical_ner = MedicalNerModel     ...        .pretrained("ner_clinical", "en", "clinical/models")     ...        .setInputCols(["sentence", "token", "embeddings"])     ...        .setOutputCol("ner")
>>>     ner_converter = NerConverter()     ...         .setInputCols(["sentence", "token", "ner"])     ...         .setOutputCol("ner_chunk")

Define the ContextualAssertion model:

>>> data = spark.createDataFrame([["No kidney injury reported. No abnormal rashes or ulcers. Patient might not have liver disease."]]).toDF("text")
>>> contextual_assertion = ContextualAssertion()     ...        .setInputCols(["sentence", "token", "ner_chunk"])     ...        .setOutputCol("assertion")     ...        .setPrefixKeywords(["no", "not"])     ...        .setSuffixKeywords(["unlikely","negative"])     ...        .setPrefixRegexPatterns(["\b(no|without|denies|never|none|free of|not include)\b"])     ...        .setSuffixRegexPatterns(["\b(free of|negative for|absence of|not|rule out)\b"])     ...        .setExceptionKeywords(["without"])     ...        .setExceptionRegexPatterns(["\b(not clearly)\b"])     ...        .addPrefixKeywords(["negative for","negative"])     ...        .addSuffixKeywords(["absent","neither"])     ...        .setCaseSensitive(False)     ...        .setPrefixAndSuffixMatch(False)     ...        .setAssertion("absent")     ...        .setScopeWindow([2, 2])
>>> flattener = Flattener()     ...            .setInputCols("assertion")     ...            .setExplodeSelectedFields({"assertion": ["result",
...                                                     "metadata.ner_chunk as ner_chunk",
...                                                     "metadata.ner_label as ner_label"]})
>>> pipeline = Pipeline(stages=[
...     documentAssembler,
...     sentenceDetector,
...     tokenizer,
...     contextual_assertion,
...   ])
>>> result = pipeline.fit(data).transform(data)
>>> result.show(truncate=False)

assertion_result

ner_chunk

ner_label

absent absent absent

kidney injury abnormal rashes liver disease

PROBLEM PROBLEM PROBLEM

assertion#
caseSensitive#
confidenceCalculationDirection#
doExceptionHandling#
getter_attrs = []#
includeChunkToScope#
inputAnnotatorTypes#
inputCols#
lazyAnnotator#
name = 'ContextualAssertion'#
optionalInputAnnotatorTypes = []#
outputAnnotatorType#
outputCol#
prefixAndSuffixMatch#
scopeWindow#
scopeWindowDelimiters#
skipLPInputColsValidation = True#
uid#
addPrefixKeywords(value: list)#

Adds the keywords to match

Parameters:

value (list) – Prefix keywords to match

addSuffixKeywords(value: list)#

Adds the keywords to match

Parameters:

value (list) – Suffix keywords to match

clear(param: pyspark.ml.param.Param) None#

Clears a param from the param map if it has been explicitly set.

copy(extra: pyspark.ml._typing.ParamMap | None = None) JP#

Creates a copy of this instance with the same uid and some extra params. This implementation first calls Params.copy and then make a copy of the companion Java pipeline component with extra params. So both the Python wrapper and the Java pipeline component get copied.

Parameters:

extra (dict, optional) – Extra parameters to copy to the new instance

Returns:

Copy of this instance

Return type:

JavaParams

explainParam(param: str | Param) str#

Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.

explainParams() str#

Returns the documentation of all params with their optionally default values and user-supplied values.

extractParamMap(extra: pyspark.ml._typing.ParamMap | None = None) pyspark.ml._typing.ParamMap#

Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.

Parameters:

extra (dict, optional) – extra param values

Returns:

merged param map

Return type:

dict

getInputCols()#

Gets current column names of input annotations.

getLazyAnnotator()#

Gets whether Annotator should be evaluated lazily in a RecursivePipeline.

getOrDefault(param: str) Any#
getOrDefault(param: Param[T]) T

Gets the value of a param in the user-supplied param map or its default value. Raises an error if neither is set.

getOutputCol()#

Gets output column name of annotations.

getParam(paramName: str) Param#

Gets a param by its name.

getParamValue(paramName)#

Gets the value of a parameter.

Parameters:

paramName (str) – Name of the parameter

hasDefault(param: str | Param[Any]) bool#

Checks whether a param has a default value.

hasParam(paramName: str) bool#

Tests whether this instance contains a param with a given (string) name.

inputColsValidation(value)#
isDefined(param: str | Param[Any]) bool#

Checks whether a param is explicitly set by user or has a default value.

isSet(param: str | Param[Any]) bool#

Checks whether a param is explicitly set by user.

classmethod load(path: str) RL#

Reads an ML instance from the input path, a shortcut of read().load(path).

static pretrained(name='contextual_assertion_absent', lang='en', remote_loc='clinical/models')#

Download a pre-trained ContextualAssertion.

Parameters:
  • name (str) – Name of the pre-trained model, by default “contextual_assertion_absent”

  • lang (str) – Language of the pre-trained model, by default “en”

  • remote_loc (str) – Remote location of the pre-trained model. If None, use the open-source location. Other values are “clinical/models”, “finance/models”, or “legal/models”.

Returns:

A pre-trained ContextualAssertion

Return type:

ContextualAssertion

classmethod read()#

Returns an MLReader instance for this class.

save(path: str) None#

Save this ML instance to the given path, a shortcut of ‘write().save(path)’.

set(param: Param, value: Any) None#

Sets a parameter in the embedded param map.

setAssertion(value)#
Sets the assertion to match.

Default is “absent”

Parameters:

value (str) – Assertion to match

setCaseSensitive(value)#
Sets whether to use case sensitive when matching values.

Default is False

Parameters:

value (bool) – Whether to use case sensitive when matching values

setConfidenceCalculationDirection(value)#
Sets Direction of confidence calculation.

If left, the confidence is calculated based on the distance of the found regex or keyword in left side of the sentence from a chunk. If right, the confidence is calculated based on the distance of the found regex or keyword in right side of the sentence from a chunk. If both, the confidence is calculated based on the minimum distance of the found regex or keyword in both sides of the sentence from a chunk. Default is “left”

Parameters:

value (str) – Direction of confidence calculation.

setDoExceptionHandling(value: bool)#

If True, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

Parameters:

value (bool) – If True, exceptions are handled.

setExceptionKeywords(value: list)#
Set the exception patterns not to be searched for.

Defaults are “not only”, “not necessarily”, “not need”, “not certain if”, “not clearly”, “not likely”, “not cause”, “not extend”, “not always”, “not only”, “not yet”, “not otherwise”, “not exclude”

Parameters:

value (list) – Exception keywords not to match

setExceptionRegexPatterns(value: list)#
Sets the exception regex pattern not to match

Default is empty list

Parameters:

value (list) – Exception regex patterns not to match

setForceInputTypeValidation(etfm)#
setIncludeChunkToScope(value)#
Sets whether to include chunk to scope when matching values

Default is False

Parameters:

value (bool) – Whether to include chunk to scope when matching values

setInputCols(*value)#

Sets column names of input annotations.

Parameters:

*value (List[str]) – Input columns for the annotator

setLazyAnnotator(value)#

Sets whether Annotator should be evaluated lazily in a RecursivePipeline.

Parameters:

value (bool) – Whether Annotator should be evaluated lazily in a RecursivePipeline

setOutputCol(value)#

Sets output column name of annotations.

Parameters:

value (str) – Name of output column

setParamValue(paramName)#

Sets the value of a parameter.

Parameters:

paramName (str) – Name of the parameter

setParams()#
setPrefixAndSuffixMatch(value)#
Sets whether to match both prefix and suffix to annotate the hit.

Default is False

Parameters:

value (bool) – Whether to match both prefix and suffix to annotate the hit

setPrefixKeywords(value: list)#
Set the prefix keywords to look for before chunk.

Defaults are “no”, “not”, “never”, “without”, “absent”, “neither”, “nor”, “denies”, “free of”, “lack of”, “unremarkable for”, “ruled out”, “rule out”, “declined”, “denied”

Parameters:

value (list) – Prefix keywords to match

setPrefixRegexPatterns(value: list)#
Sets the prefix regex pattern to match

Default is empty list.

Parameters:

value (list) – Prefix regex patterns to match

setScopeWindow(value)#
Set the scope window of the assertion. The scope window is defined by two non-negative integers except (-1,-1).

The first integer is the number of tokens to the left of the chunk, and the second integer is the number of tokens to the right of the chunk. Default is (-1, -1) which means the whole sentence

Parameters:

value ([int, int]) – Left and right offset if the scope window. Offsets must be non-negative values

setScopeWindowDelimiters(value: list)#

Set delimiters used to limit the scope window.

Parameters:

value (List[str]) – Delimiters used to limit the scope window.

setSuffixKeywords(value: list)#
Set the suffix keywords to look for after chunk.

Defaults are “not detected”, “not demonstrate”, “not appear”, “not had”, “was ruled out”, “were ruled out”, “are ruled out”, “is ruled out”, “unlikely”, “not developed”, “not present”, “not associated with”, “not had”, “free from”, “resolved”

Parameters:

value (list) – Suffix keywords to match

setSuffixRegexPatterns(value: list)#
Sets the suffix regex pattern to match

Default is empty list

Parameters:

value (list) – Suffix regex patterns to match

transform(dataset: pyspark.sql.dataframe.DataFrame, params: pyspark.ml._typing.ParamMap | None = None) pyspark.sql.dataframe.DataFrame#

Transforms the input dataset with optional parameters.

New in version 1.3.0.

Parameters:
  • dataset (pyspark.sql.DataFrame) – input dataset

  • params (dict, optional) – an optional param map that overrides embedded params.

Returns:

transformed dataset

Return type:

pyspark.sql.DataFrame

write() JavaMLWriter#

Returns an MLWriter instance for this ML instance.