Packages

package assertion

Type Members

  1. class AssertionChunkConverter extends Transformer with DefaultParamsWritable with HasInputAnnotationCols with HasOutputAnnotationCol with HasOutputAnnotatorType with CheckLicense

    Creates a chunk column with metadata for training assertion status detection models.

    Creates a chunk column with metadata for training assertion status detection models.

    In some cases, there may be issues while creating the chunk column when using token indices that can lead to loss of data to train assertion status models. The AssertionChunkConverter annotator uses both begin and end indices of the tokens as input to add a more robust metadata to the chunk column in a way that improves the reliability of the indices and avoid loss of data.

    Notes

    Chunk begin and end indices in the assertion status model training dataframe can be populated using the new version of ALAB module.

    Example

    Define the stages of the pipeline

    val document = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    val converter  = new AssertionChunkConverter()
      .setInputCols("tokens")
      .setChunkTextCol("target")
      .setChunkBeginCol("char_begin")
      .setChunkEndCol("char_end")
      .setOutputTokenBeginCol("token_begin")
      .setOutputTokenEndCol("token_end")
      .setOutputCol("chunk")

    Define the pipeline and obtain the results

    val pipeline = new Pipeline().setStages(Array(
      document,
      sentenceDetector,
      tokenizer,
      converter
    ))
    
    results = pipeline.fit(data).transform(data)
  2. case class Datapoint(sentence: String, target: String, label: String, start: Int, end: Int) extends Product with Serializable

    Created by jose on 19/03/18.

Ungrouped