AnnotationToolJsonReader

class AnnotationToolJsonReader extends CheckLicense

Reads and process the exported json file from NLP Lab.

Reader class that parses relevant information exported from NLP Lab into different formats. The reader can be used to create a training dataset for training assertion status (using the generateAssertionTrainSet method) or NER models (in the CoNLL format using the generateConll method).

To generate the assertion data, the following attributes need to be specified when instantiating the class: - assertion_labels: The assertion labels to use. - excluded_labels: The assertions labels that are excluded for the training dataset creation (can be an empty list).

Linear Supertypes

CheckLicense, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

AnnotationToolJsonReader
CheckLicense
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new AnnotationToolJsonReader(assertionLabels: Array[String])
new AnnotationToolJsonReader(pipelineModel: PipelineModel, assertionLabels: List[String])
new AnnotationToolJsonReader(assertionLabels: Array[String], excludedLabels: Array[String])
new AnnotationToolJsonReader(pipelineModel: PipelineModel, assertionLabels: Array[String], excludedLabels: Array[String], scheme: String, minCharsTol: Int, alignCharsTol: Int, mergeOverlapping: Boolean)
new AnnotationToolJsonReader(pipelineModel: PipelineModel, assertionLabels: List[String], excludedLabels: List[String], scheme: String, minCharsTol: Int, alignCharsTol: Int, mergeOverlapping: Boolean)
pipelineModel
The pipeline model that is used to create the documents the sentences and the tokens. And example of that pipeline could be the following: {{ val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = new SentenceDetector() .setInputCols(Array("document")) .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols(Array("sentence")) .setOutputCol("token") pipline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer)) * }
assertionLabels
The labels that will be interpreted as Assertion labels.
excludedLabels
The labels that will be excluded from the NER / Assertion Annotations.
scheme
The schema that will use to create the IOB_tagger (IOB or BIOES).
minCharsTol
The min length of the token to be consider a label.
alignCharsTol
The maximum number of char difference to consider the start / end tokens alignment with the Assertion Annotations.
mergeOverlapping
'true' to merge the overlapping annotations prioritizing longest and more diverse annotations, 'false' to keep both chunks in the same position.
new AnnotationToolJsonReader(assertionLabels: List[String] = List.empty[String].asJava, excludedLabels: List[String] = List.empty[String].asJava, cleanupMode: String = "disabled", splitChars: List[String] = List().asJava, contextChars: List[String] = List().asJava, scheme: String = "IOB", minCharsTol: Int = 2, alignCharsTol: Int = 1, mergeOverlapping: Boolean = true, SDDLPath: String = "")
The annotation tool json reader is a reader that generate a assertion train set from the json from annotations labs exports.
The annotation tool json reader is a reader that generate a assertion train set from the json from annotations labs exports.
assertionLabels
The assertions labels are used for the training dataset creation.
excludedLabels
The NER or Assertion labels to be excluded from the training dataset creation.
cleanupMode
The clean mode that is used in the DocumentAssembler transformer.
splitChars
The split chars that are used in the default tokenizer. com.johnsnowlabs.nlp.annotators.Tokenizer.setSplitChars().
contextChars
The context chars that are used in the default tokenizer. com.johnsnowlabs.nlp.annotators.Tokenizer.setContextChars().
scheme
The schema that will use to create the IOB_tagger (IOB or BIOES).
minCharsTol
The min length of the token to apply the alignment tolerance feature.
alignCharsTol
The maximum number of char difference to consider the start / end tokens alignment with the assertion annotations.
mergeOverlapping
'true' to merge the overlapping annotations, 'false' to keep both chunks in the same position.
SDDLPath
The context chars that are used in the default tokenizer com.johnsnowlabs.nlp.annotators.sbd.pragmatic.SentenceDetector.load().

Annotations
@deprecated
Deprecated
new AnnotationToolJsonReader(assertionLabels: Array[String], excludedLabels: Array[String], cleanupMode: String, splitChars: Array[String], contextChars: Array[String], scheme: String, minCharsTol: Int, alignCharsTol: Int, mergeOverlapping: Boolean, SDDLPath: String)

Annotations
@deprecated
Deprecated

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
val alignCharsTol: Int
final def asInstanceOf[T0]: T0

Definition Classes
Any
val assertionLabels: List[String]
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
val excludedLabels: List[String]
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
val fs: FileSystem
def generateAssertionTrainSet(df: DataFrame, sentenceCol: String = "sentence", assertionCol: String = "assertion_label"): DataFrame
Generates assertion training data at token level.
Generates assertion training data at token level.
Using information from the sentence and assertion labels, this method generates a training data set with the following columns:
- text: sentence text
- target: the token text
- label: the assertion label
- start: start position of the token
- end: end position of the token The tokens are identified internally with the constraints from the min_chars_tol and align_chars_tol parameters.
df
DataFrame with sentence and assertion annotations like the one returned by readDataset method
sentenceCol
Name of the column having Sentence Annotations
assertionCol
Name of the column having Assertion Annotations
returns
A dataframe to train an AssertionDL Model
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val mergeOverlapping: Boolean
val minCharsTol: Int
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
val pipelineModel: PipelineModel
def readDataset(spark: SparkSession, path: String): DataFrame
The readDataset method uses the spark session to read a access a path to a JSON from the Annnotation Tool and provide a DataFrame with proper annotations to train SparkNLP and SparkNLP Enterprise Annotators
The readDataset method uses the spark session to read a access a path to a JSON from the Annnotation Tool and provide a DataFrame with proper annotations to train SparkNLP and SparkNLP Enterprise Annotators
spark
SparkSession
path
Path to a JSON exported from the Annotation Tool
returns
A dataframe with NER, Assertion and RE annotations
val scheme: String
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Deprecated Value Members

def generateConll(df: DataFrame, path: String, taskColumn: String = "task_id", tokenCol: String = "token", nerLabel: String = "ner_label"): Unit
Saves a CoNLL file from the exported annotations.
Saves a CoNLL file from the exported annotations.

Annotations
@deprecated
Deprecated
def generatePlainAssertionTrainSet(df: DataFrame, taskColumn: String = "task_id", tokenCol: String = "token", nerLabel: String = "ner_label", assertion_label: String = "assertion_label"): DataFrame
Generates assertion training data at chunk level.
Generates assertion training data at chunk level.
Using information from the sentence, task id (from NLP Lab), ner label, and assertion labels, this method generates a training data set with the following columns: - sentence: sentence text - begin: start position of the token - end: end position of the token - ner: the NER chunk - assertion: the assertion label
Internally uses the NerConverterInternal to identify the NER chunks.

Annotations
@deprecated
Deprecated

Packages

AnnotationToolJsonReader

class AnnotationToolJsonReader extends CheckLicense

Instance Constructors

Value Members

Deprecated Value Members

Inherited from CheckLicense

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

AnnotationToolJsonReader 

class AnnotationToolJsonReader extends CheckLicense

Instance Constructors

Value Members

Deprecated Value Members

Inherited from CheckLicense

Inherited from AnyRef

Inherited from Any

Ungrouped

AnnotationToolJsonReader