sparknlp_jsl.training.AnnotationToolJsonReader#

class sparknlp_jsl.training.AnnotationToolJsonReader(pipeline_model=None, assertion_labels=None, excluded_labels=None, cleanup_mode='disabled', split_chars=None, context_chars=None, scheme='IOB', min_chars_tol=2, align_chars_tol=1, merge_overlapping=True, SDDLPath='')[source]#

Bases: ExtendedJavaWrapper

Class to This a reader that generate a assertion train set from the json from annotations labs exports.

Examples

>>> from sparknlp_jsl.training import AnnotationToolJsonReader
>>> assertion_labels = ["AsPresent","Absent"]
>>> excluded_labels = ["Treatment"]
>>> split_chars = [" ", "\-"]
>>> context_chars = [".", ",", ";"]
>>> SDDLPath = ""
>>> rdr = AnnotationToolJsonReader(assertion_labels = assertion_labels, excluded_labels = excluded_labels, split_chars = split_chars, context_chars = context_chars,SDDLPath=SDDLPath)
>>> path = "src/test/resources/anc-pos-corpus-small/test-training.txt"
>>> df = rdr.readDataset(spark, json_path)
>>> assertion_df = rdr.generateAssertionTrainSet(df)
>>> assertion_df.show()

Methods

`__init__`([pipeline_model, assertion_labels, ...])	Attributes:
`apply`()
`generateAssertionTrainSet`(df[, sentenceCol, ...])
`generateConll`(df, path[, taskColumn, ...])
`generatePlainAssertionTrainSet`(df[, ...])
`new_java_array`(pylist, java_class)	ToDo: Inspired from spark 2.0.
`new_java_array_integer`(pylist)
`new_java_array_string`(pylist)
`new_java_obj`(java_class, *args)
`readDataset`(spark, path)

new_java_array(pylist, java_class)#: ToDo: Inspired from spark 2.0. Review if spark changes