training

package training

Ordering

Alphabetic

Visibility

Public
All

Type Members

case class AnnotationDefinition(from_name: Option[String], id: Option[String], source: Option[String], to_name: Option[String], type: String, value: Option[AnnotationValue], direction: Option[String], from_id: Option[String], to_id: Option[String], labels: Option[List[String]]) extends Serializable with Product
class AnnotationToolJsonReader extends CheckLicense
Reads and process the exported json file from NLP Lab.
Reads and process the exported json file from NLP Lab.
Reader class that parses relevant information exported from NLP Lab into different formats. The reader can be used to create a training dataset for training assertion status (using the generateAssertionTrainSet method) or NER models (in the CoNLL format using the generateConll method).
To generate the assertion data, the following attributes need to be specified when instantiating the class: - assertion_labels: The assertion labels to use. - excluded_labels: The assertions labels that are excluded for the training dataset creation (can be an empty list).
case class AnnotationValue(start: Option[Int], end: Option[Int], text: Option[String], labels: Option[List[String]], choices: Option[List[String]]) extends Serializable with Product
case class AnnotationValueChoices(choices: List[String]) extends Serializable with Product
case class AnnotationValueLabel(start: Int, end: Int, text: String, labels: List[String]) extends Serializable with Product
class CantemistReader extends AnyRef
class CodiEspReader extends AnyRef
case class CompletionDefinition(id: Long, created_username: String, result: Seq[AnnotationDefinition], created_ago: Option[String], lead_time: Option[Double], honeypot: Option[Boolean]) extends Serializable with Product
case class NerAnnotationDefinition(from_name: String, id: String, to_name: String, type: String, value: AnnotationValue) extends Serializable with Product
case class RelAnnotationDefinition(from_id: String, to_id: String, direction: String, type: String) extends Serializable with Product
case class SynonymAugmentationUMLS(sparkSession: SparkSession, umlsMetaPath: String = "", codeCol: String = "code", descriptionCol: String = "description", caseSensitive: Boolean = false) extends CheckLicense with Product with Serializable
Contains all methods to augment any given DataFrame in terms of a Combinatorial NER Synonym Matching using UMLS or SentenceEntityResolvers.
Contains all methods to augment any given DataFrame in terms of a Combinatorial NER Synonym Matching using UMLS or SentenceEntityResolvers. The augment function takes a DataFrame and NER PipelineModel and augments it by exactly matching Named Entities through UMLS Synonym Relation or by using any SentenceEntityResolver's output metadata. UMLS META directory is expected to be in the FS if UMLS is used as SynonymSource; in case RESOLUTIONS are used as SynonymSource umlsMetaPath parameter is ignored. The DataFrame is expected to have two columns: an 'identification' column (hopefully unique for user's sake) and an 'information' text column. In case the augment function is called with augmentationMode=="chunk", the 'information' column should be the output of a chunk AnnotatorType. Pipeline is expected to have one single stage for each Annotator.
Example
Augmenting a simple sentence
Define or load an NER pipeline including a chunk AnnotatorType for your source data:
```
val doc = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val tkn = new Tokenizer().setInputCols("document").setOutputCol("token")
val embs = WordEmbeddingsModel.pretrained("embeddings_clinical", "en" , "clinical/models")
          .setInputCols("document","token").setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
          .setInputCols("document","token","embeddings").setOutputCol("ner")
val conv = new NerConverterInternal().setInputCols("document","token","ner").setOutputCol("ner_chunk")
val edf = ResourceHelper.spark.createDataFrame(Array(Tuple1(""))).toDF("text")
val plModel = new Pipeline().setStages(Array(doc,tkn,embs,ner,conv)).fit(edf)
```
Then we can create the augmenter object and call the augment function like the following:
```
val augmenter = SynonymAugmentationUMLS(ResourceHelper.spark, "src/test/resources/synonym-augmentation/mini_umls_meta", "id", "text")
val synsSimple = augmenter.augmentCsv(edf, plModel, "ENG", false, AugmentationModes.PLAIN_TEXT).cache()
syns.orderBy("code").show(1000, false)
print(syns.count()) // Will most probably exceed the original number of unique rows due to augmentation
```
case class TaskDataDefinition(text: String, title: Option[String]) extends Product with Serializable
case class TaskDefinition(id: Long, data: TaskDataDefinition, completions: Seq[CompletionDefinition], created_at: Option[String], created_by: Option[String], title: Option[String]) extends Serializable with Product
case class TaskReader() extends Product with Serializable
case class TokenRow(task_id: Long, begin: Int, end: Int, token: String, label: String, sentence: String) extends Product with Serializable

Annotations
@deprecated
Deprecated

Packages

training

package training

Type Members

Example

Augmenting a simple sentence

Value Members

Deprecated Value Members

Ungrouped

Packages

training 

package training

Type Members

Example

Augmenting a simple sentence

Value Members

Deprecated Value Members

Ungrouped

training