sparknlp_jsl.annotator.assertion.contextual_assertion
#
Module Contents#
Classes#
An annotator model for contextual assertion analysis. |
- class ContextualAssertion(classname='com.johnsnowlabs.nlp.annotators.assertion.context.ContextualAssertion', java_model=None)#
Bases:
sparknlp_jsl.common.AnnotatorModelInternal
,sparknlp_jsl.annotator.handle_exception_params.HandleExceptionParams
An annotator model for contextual assertion analysis.
This model identifies contextual cues within text data, such as negation, uncertainty, and assertion. It is used clinical assertion detection, etc. It annotates text chunks with assertions based on configurable rules, prefix and suffix patterns, and exception patterns.
Input Annotation types
Output Annotation type
DOCUMENT, TOKEN, CHUNK
ASSERTION
- Parameters:
caseSensitive – Whether to use case sensitive when matching values
prefixAndSuffixMatch – Whether to match both prefix and suffix to annotate the hit
prefixKeywords – Prefix keywords to match
suffixKeywords – Suffix keywords to match
exceptionKeywords – Exception keywords not to match
prefixRegexPatterns – Prefix regex patterns to match
suffixRegexPatterns – Suffix regex pattern to match
exceptionRegexPatterns – Exception regex pattern not to match
scopeWindow – The scope window of the assertion expression
assertion – Assertion to match
includeChunkToScope – Whether to include chunk to scope when matching values
scopeWindowDelimiters – Delimiters used to limit the scope window.
confidenceCalculationDirection – Direction of confidence calculation. Accepted values are “left”, “right”, “both”. Default is “left”
Examples
>>> import sparknlp >>> from sparknlp.base import * >>> from sparknlp.annotator import * >>> import sparknlp_jsl >>> from sparknlp_jsl.annotator import * >>> from pyspark.ml import Pipeline
>>> documentAssembler = DocumentAssembler() ... .setInputCol("text") ... .setOutputCol("document") ... >>> sentenceDetector = SentenceDetector() ... .setInputCols(["document"]) ... .setOutputCol("sentence") ... >>> tokenizer = Tokenizer() ... .setInputCols(["sentence"]) ... .setOutputCol("token")
>>> word_embeddings = WordEmbeddingsModel ... .pretrained("embeddings_clinical", "en", "clinical/models") ... .setInputCols(["sentence", "token"]) ... .setOutputCol("embeddings")
>>> clinical_ner = MedicalNerModel ... .pretrained("ner_clinical", "en", "clinical/models") ... .setInputCols(["sentence", "token", "embeddings"]) ... .setOutputCol("ner")
>>> ner_converter = NerConverter() ... .setInputCols(["sentence", "token", "ner"]) ... .setOutputCol("ner_chunk")
Define the ContextualAssertion model:
>>> data = spark.createDataFrame([["No kidney injury reported. No abnormal rashes or ulcers. Patient might not have liver disease."]]).toDF("text")
>>> contextual_assertion = ContextualAssertion() ... .setInputCols(["sentence", "token", "ner_chunk"]) ... .setOutputCol("assertion") ... .setPrefixKeywords(["no", "not"]) ... .setSuffixKeywords(["unlikely","negative"]) ... .setPrefixRegexPatterns(["\b(no|without|denies|never|none|free of|not include)\b"]) ... .setSuffixRegexPatterns(["\b(free of|negative for|absence of|not|rule out)\b"]) ... .setExceptionKeywords(["without"]) ... .setExceptionRegexPatterns(["\b(not clearly)\b"]) ... .addPrefixKeywords(["negative for","negative"]) ... .addSuffixKeywords(["absent","neither"]) ... .setCaseSensitive(False) ... .setPrefixAndSuffixMatch(False) ... .setAssertion("absent") ... .setScopeWindow([2, 2]) >>> flattener = Flattener() ... .setInputCols("assertion") ... .setExplodeSelectedFields({"assertion": ["result", ... "metadata.ner_chunk as ner_chunk", ... "metadata.ner_label as ner_label"]})
>>> pipeline = Pipeline(stages=[ ... documentAssembler, ... sentenceDetector, ... tokenizer, ... contextual_assertion, ... ])
>>> result = pipeline.fit(data).transform(data) >>> result.show(truncate=False)
assertion_result
ner_chunk
ner_label
absent absent absent
kidney injury abnormal rashes liver disease
PROBLEM PROBLEM PROBLEM
- assertion#
- caseSensitive#
- confidenceCalculationDirection#
- doExceptionHandling#
- getter_attrs = []#
- includeChunkToScope#
- inputAnnotatorTypes#
- inputCols#
- lazyAnnotator#
- name = 'ContextualAssertion'#
- optionalInputAnnotatorTypes = []#
- outputAnnotatorType#
- outputCol#
- prefixAndSuffixMatch#
- scopeWindow#
- scopeWindowDelimiters#
- skipLPInputColsValidation = True#
- uid#
- addPrefixKeywords(value: list)#
Adds the keywords to match
- Parameters:
value (list) – Prefix keywords to match
- addSuffixKeywords(value: list)#
Adds the keywords to match
- Parameters:
value (list) – Suffix keywords to match
- clear(param: pyspark.ml.param.Param) None #
Clears a param from the param map if it has been explicitly set.
- copy(extra: pyspark.ml._typing.ParamMap | None = None) JP #
Creates a copy of this instance with the same uid and some extra params. This implementation first calls Params.copy and then make a copy of the companion Java pipeline component with extra params. So both the Python wrapper and the Java pipeline component get copied.
- Parameters:
extra (dict, optional) – Extra parameters to copy to the new instance
- Returns:
Copy of this instance
- Return type:
JavaParams
- explainParam(param: str | Param) str #
Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
- explainParams() str #
Returns the documentation of all params with their optionally default values and user-supplied values.
- extractParamMap(extra: pyspark.ml._typing.ParamMap | None = None) pyspark.ml._typing.ParamMap #
Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
- Parameters:
extra (dict, optional) – extra param values
- Returns:
merged param map
- Return type:
dict
- getInputCols()#
Gets current column names of input annotations.
- getLazyAnnotator()#
Gets whether Annotator should be evaluated lazily in a RecursivePipeline.
- getOrDefault(param: str) Any #
- getOrDefault(param: Param[T]) T
Gets the value of a param in the user-supplied param map or its default value. Raises an error if neither is set.
- getOutputCol()#
Gets output column name of annotations.
- getParam(paramName: str) Param #
Gets a param by its name.
- getParamValue(paramName)#
Gets the value of a parameter.
- Parameters:
paramName (str) – Name of the parameter
- hasDefault(param: str | Param[Any]) bool #
Checks whether a param has a default value.
- hasParam(paramName: str) bool #
Tests whether this instance contains a param with a given (string) name.
- inputColsValidation(value)#
- isDefined(param: str | Param[Any]) bool #
Checks whether a param is explicitly set by user or has a default value.
- isSet(param: str | Param[Any]) bool #
Checks whether a param is explicitly set by user.
- classmethod load(path: str) RL #
Reads an ML instance from the input path, a shortcut of read().load(path).
- static pretrained(name='contextual_assertion_absent', lang='en', remote_loc='clinical/models')#
Download a pre-trained ContextualAssertion.
- Parameters:
name (str) – Name of the pre-trained model, by default “contextual_assertion_absent”
lang (str) – Language of the pre-trained model, by default “en”
remote_loc (str) – Remote location of the pre-trained model. If None, use the open-source location. Other values are “clinical/models”, “finance/models”, or “legal/models”.
- Returns:
A pre-trained ContextualAssertion
- Return type:
- classmethod read()#
Returns an MLReader instance for this class.
- save(path: str) None #
Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
- set(param: Param, value: Any) None #
Sets a parameter in the embedded param map.
- setAssertion(value)#
- Sets the assertion to match.
Default is “absent”
- Parameters:
value (str) – Assertion to match
- setCaseSensitive(value)#
- Sets whether to use case sensitive when matching values.
Default is False
- Parameters:
value (bool) – Whether to use case sensitive when matching values
- setConfidenceCalculationDirection(value)#
- Sets Direction of confidence calculation.
If left, the confidence is calculated based on the distance of the found regex or keyword in left side of the sentence from a chunk. If right, the confidence is calculated based on the distance of the found regex or keyword in right side of the sentence from a chunk. If both, the confidence is calculated based on the minimum distance of the found regex or keyword in both sides of the sentence from a chunk. Default is “left”
- Parameters:
value (str) – Direction of confidence calculation.
- setDoExceptionHandling(value: bool)#
If True, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.
- Parameters:
value (bool) – If True, exceptions are handled.
- setExceptionKeywords(value: list)#
- Set the exception patterns not to be searched for.
Defaults are “not only”, “not necessarily”, “not need”, “not certain if”, “not clearly”, “not likely”, “not cause”, “not extend”, “not always”, “not only”, “not yet”, “not otherwise”, “not exclude”
- Parameters:
value (list) – Exception keywords not to match
- setExceptionRegexPatterns(value: list)#
- Sets the exception regex pattern not to match
Default is empty list
- Parameters:
value (list) – Exception regex patterns not to match
- setForceInputTypeValidation(etfm)#
- setIncludeChunkToScope(value)#
- Sets whether to include chunk to scope when matching values
Default is False
- Parameters:
value (bool) – Whether to include chunk to scope when matching values
- setInputCols(*value)#
Sets column names of input annotations.
- Parameters:
*value (List[str]) – Input columns for the annotator
- setLazyAnnotator(value)#
Sets whether Annotator should be evaluated lazily in a RecursivePipeline.
- Parameters:
value (bool) – Whether Annotator should be evaluated lazily in a RecursivePipeline
- setOutputCol(value)#
Sets output column name of annotations.
- Parameters:
value (str) – Name of output column
- setParamValue(paramName)#
Sets the value of a parameter.
- Parameters:
paramName (str) – Name of the parameter
- setParams()#
- setPrefixAndSuffixMatch(value)#
- Sets whether to match both prefix and suffix to annotate the hit.
Default is False
- Parameters:
value (bool) – Whether to match both prefix and suffix to annotate the hit
- setPrefixKeywords(value: list)#
- Set the prefix keywords to look for before chunk.
Defaults are “no”, “not”, “never”, “without”, “absent”, “neither”, “nor”, “denies”, “free of”, “lack of”, “unremarkable for”, “ruled out”, “rule out”, “declined”, “denied”
- Parameters:
value (list) – Prefix keywords to match
- setPrefixRegexPatterns(value: list)#
- Sets the prefix regex pattern to match
Default is empty list.
- Parameters:
value (list) – Prefix regex patterns to match
- setScopeWindow(value)#
- Set the scope window of the assertion. The scope window is defined by two non-negative integers except (-1,-1).
The first integer is the number of tokens to the left of the chunk, and the second integer is the number of tokens to the right of the chunk. Default is (-1, -1) which means the whole sentence
- Parameters:
value ([int, int]) – Left and right offset if the scope window. Offsets must be non-negative values
- setScopeWindowDelimiters(value: list)#
Set delimiters used to limit the scope window.
- Parameters:
value (List[str]) – Delimiters used to limit the scope window.
- setSuffixKeywords(value: list)#
- Set the suffix keywords to look for after chunk.
Defaults are “not detected”, “not demonstrate”, “not appear”, “not had”, “was ruled out”, “were ruled out”, “are ruled out”, “is ruled out”, “unlikely”, “not developed”, “not present”, “not associated with”, “not had”, “free from”, “resolved”
- Parameters:
value (list) – Suffix keywords to match
- setSuffixRegexPatterns(value: list)#
- Sets the suffix regex pattern to match
Default is empty list
- Parameters:
value (list) – Suffix regex patterns to match
- transform(dataset: pyspark.sql.dataframe.DataFrame, params: pyspark.ml._typing.ParamMap | None = None) pyspark.sql.dataframe.DataFrame #
Transforms the input dataset with optional parameters.
New in version 1.3.0.
- Parameters:
dataset (
pyspark.sql.DataFrame
) – input datasetparams (dict, optional) – an optional param map that overrides embedded params.
- Returns:
transformed dataset
- Return type:
- write() JavaMLWriter #
Returns an MLWriter instance for this ML instance.