sparknlp_jsl.training.AnnotationToolJsonReader#
- class sparknlp_jsl.training.AnnotationToolJsonReader(pipeline_model=None, assertion_labels=None, excluded_labels=None, cleanup_mode='disabled', split_chars=None, context_chars=None, scheme='IOB', min_chars_tol=2, align_chars_tol=1, merge_overlapping=True, SDDLPath='')[source]#
Bases:
ExtendedJavaWrapper
Class to This a reader that generate a assertion train set from the json from annotations labs exports.
Examples
>>> from sparknlp_jsl.training import AnnotationToolJsonReader >>> assertion_labels = ["AsPresent","Absent"] >>> excluded_labels = ["Treatment"] >>> split_chars = [" ", "\-"] >>> context_chars = [".", ",", ";"] >>> SDDLPath = "" >>> rdr = AnnotationToolJsonReader(assertion_labels = assertion_labels, excluded_labels = excluded_labels, split_chars = split_chars, context_chars = context_chars,SDDLPath=SDDLPath) >>> path = "src/test/resources/anc-pos-corpus-small/test-training.txt" >>> df = rdr.readDataset(spark, json_path) >>> assertion_df = rdr.generateAssertionTrainSet(df) >>> assertion_df.show()
Methods
__init__
([pipeline_model, assertion_labels, ...])- Attributes:
apply
()generateAssertionTrainSet
(df[, sentenceCol, ...])generateConll
(df, path[, taskColumn, ...])generatePlainAssertionTrainSet
(df[, ...])new_java_array
(pylist, java_class)ToDo: Inspired from spark 2.0.
new_java_array_integer
(pylist)new_java_array_string
(pylist)new_java_obj
(java_class, *args)readDataset
(spark, path)- new_java_array(pylist, java_class)#
ToDo: Inspired from spark 2.0. Review if spark changes