sparknlp_jsl.training.AnnotationToolJsonReader#

class sparknlp_jsl.training.AnnotationToolJsonReader(pipeline_model=None, assertion_labels=None, excluded_labels=None, cleanup_mode='disabled', split_chars=None, context_chars=None, scheme='IOB', min_chars_tol=2, align_chars_tol=1, merge_overlapping=True, SDDLPath='')[source]#

Bases: ExtendedJavaWrapper

Class to This a reader that generate a assertion train set from the json from annotations labs exports.

Examples

>>> from sparknlp_jsl.training import AnnotationToolJsonReader
>>> assertion_labels = ["AsPresent","Absent"]
>>> excluded_labels = ["Treatment"]
>>> split_chars = [" ", "\-"]
>>> context_chars = [".", ",", ";"]
>>> SDDLPath = ""
>>> rdr = AnnotationToolJsonReader(assertion_labels = assertion_labels, excluded_labels = excluded_labels, split_chars = split_chars, context_chars = context_chars,SDDLPath=SDDLPath)
>>> path = "src/test/resources/anc-pos-corpus-small/test-training.txt"
>>> df = rdr.readDataset(spark, json_path)
>>> assertion_df = rdr.generateAssertionTrainSet(df)
>>> assertion_df.show()

Methods

__init__([pipeline_model, assertion_labels, ...])

Attributes:

apply()

generateAssertionTrainSet(df[, sentenceCol, ...])

generateConll(df, path[, taskColumn, ...])

generatePlainAssertionTrainSet(df[, ...])

new_java_array(pylist, java_class)

ToDo: Inspired from spark 2.0.

new_java_array_integer(pylist)

new_java_array_string(pylist)

new_java_obj(java_class, *args)

readDataset(spark, path)

new_java_array(pylist, java_class)#

ToDo: Inspired from spark 2.0. Review if spark changes