package training

  1. Alphabetic
  1. Public
  2. All

Type Members

  1. case class AnnotationDefinition(from_name: Option[String], id: Option[String], source: Option[String], to_name: Option[String], type: String, value: Option[AnnotationValue], direction: Option[String], from_id: Option[String], to_id: Option[String], labels: Option[List[String]]) extends Serializable with Product
  2. class AnnotationToolJsonReader extends CheckLicense

    Reads and process the exported json file from NLP Lab.

    Reads and process the exported json file from NLP Lab.

    Reader class that parses relevant information exported from NLP Lab into different formats. The reader can be used to create a training dataset for training assertion status (using the generateAssertionTrainSet method) or NER models (in the CoNLL format using the generateConll method).

    To generate the assertion data, the following attributes need to be specified when instantiating the class: - assertion_labels: The assertion labels to use. - excluded_labels: The assertions labels that are excluded for the training dataset creation (can be an empty list).

  3. case class AnnotationValue(start: Option[Int], end: Option[Int], text: Option[String], labels: Option[List[String]], choices: Option[List[String]]) extends Serializable with Product
  4. case class AnnotationValueChoices(choices: List[String]) extends Serializable with Product
  5. case class AnnotationValueLabel(start: Int, end: Int, text: String, labels: List[String]) extends Serializable with Product
  6. class CantemistReader extends AnyRef
  7. class CodiEspReader extends AnyRef
  8. case class CompletionDefinition(id: Long, created_username: String, result: Seq[AnnotationDefinition], created_ago: Option[String], lead_time: Option[Double], honeypot: Option[Boolean]) extends Serializable with Product
  9. case class NerAnnotationDefinition(from_name: String, id: String, to_name: String, type: String, value: AnnotationValue) extends Serializable with Product
  10. case class RelAnnotationDefinition(from_id: String, to_id: String, direction: String, type: String) extends Serializable with Product
  11. case class SynonymAugmentationUMLS(sparkSession: SparkSession, umlsMetaPath: String = "", codeCol: String = "code", descriptionCol: String = "description", caseSensitive: Boolean = false) extends CheckLicense with Product with Serializable

    Contains all methods to augment any given DataFrame in terms of a Combinatorial NER Synonym Matching using UMLS or SentenceEntityResolvers.

    Contains all methods to augment any given DataFrame in terms of a Combinatorial NER Synonym Matching using UMLS or SentenceEntityResolvers. The augment function takes a DataFrame and NER PipelineModel and augments it by exactly matching Named Entities through UMLS Synonym Relation or by using any SentenceEntityResolver's output metadata. UMLS META directory is expected to be in the FS if UMLS is used as SynonymSource; in case RESOLUTIONS are used as SynonymSource umlsMetaPath parameter is ignored. The DataFrame is expected to have two columns: an 'identification' column (hopefully unique for user's sake) and an 'information' text column. In case the augment function is called with augmentationMode=="chunk", the 'information' column should be the output of a chunk AnnotatorType. Pipeline is expected to have one single stage for each Annotator.


    Augmenting a simple sentence

    Define or load an NER pipeline including a chunk AnnotatorType for your source data:

    val doc = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val tkn = new Tokenizer().setInputCols("document").setOutputCol("token")
    val embs = WordEmbeddingsModel.pretrained("embeddings_clinical", "en" , "clinical/models")
    val ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
    val conv = new NerConverterInternal().setInputCols("document","token","ner").setOutputCol("ner_chunk")
    val edf = ResourceHelper.spark.createDataFrame(Array(Tuple1(""))).toDF("text")
    val plModel = new Pipeline().setStages(Array(doc,tkn,embs,ner,conv)).fit(edf)

    Then we can create the augmenter object and call the augment function like the following:

    val augmenter = SynonymAugmentationUMLS(ResourceHelper.spark, "src/test/resources/synonym-augmentation/mini_umls_meta", "id", "text")
    val synsSimple = augmenter.augmentCsv(edf, plModel, "ENG", false, AugmentationModes.PLAIN_TEXT).cache()
    syns.orderBy("code").show(1000, false)
    print(syns.count()) // Will most probably exceed the original number of unique rows due to augmentation
  12. case class TaskDataDefinition(text: String, title: Option[String]) extends Product with Serializable
  13. case class TaskDefinition(id: Long, data: TaskDataDefinition, completions: Seq[CompletionDefinition], created_at: Option[String], created_by: Option[String], title: Option[String]) extends Serializable with Product
  14. case class TaskReader() extends Product with Serializable
  15. case class TokenRow(task_id: Long, begin: Int, end: Int, token: String, label: String, sentence: String) extends Product with Serializable

Value Members

  1. object AugmentationModes

    Possible AugmentationModes

  2. object CasingFunctions
  3. object CreatorPipeline
  4. object SynonymSources

Deprecated Value Members

  1. object UDFHelper