Packages

package chunker

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class AssertionFilterer extends AnnotatorModel[AssertionFilterer] with HasSimpleAnnotate[AssertionFilterer] with Licensed

    Filters entities coming from ASSERTION type annotations and returns the CHUNKS.

    Filters entities coming from ASSERTION type annotations and returns the CHUNKS. Filters can be set via a white list on the extracted chunk, the assertion or a regular expression. White list for assertion is enabled by default. To use chunk white list, criteria has to be set to "isin". For regex, criteria has to be set to "regex".

    Example

    To see how the assertions are extracted, see the example for AssertionDLModel.

    Define an extra step where the assertions are filtered
    val assertionFilterer = new AssertionFilterer()
      .setInputCols("sentence","ner_chunk","assertion")
      .setOutputCol("filtered")
      .setCriteria("assertion")
      .setWhiteList("present")
    
    val assertionPipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      embeddings,
      nerModel,
      nerConverter,
      clinicalAssertion,
      assertionFilterer
    ))
    
    val assertionModel = assertionPipeline.fit(data)
    val result = assertionModel.transform(data)

    Show results:

    result.selectExpr("ner_chunk.result", "assertion.result").show(3, truncate=false)
    +--------------------------------+--------------------------------+
    |result                          |result                          |
    +--------------------------------+--------------------------------+
    |[severe fever, sore throat]     |[present, present]              |
    |[stomach pain]                  |[absent]                        |
    |[an epidural, PCA, pain control]|[present, present, hypothetical]|
    +--------------------------------+--------------------------------+
    
    result.select("filtered.result").show(3, truncate=false)
    +---------------------------+
    |result                     |
    +---------------------------+
    |[severe fever, sore throat]|
    |[]                         |
    |[an epidural, PCA]         |
    +---------------------------+
    See also

    AssertionDLModel to extract the assertions

  2. class ChunkFilterer extends AnnotatorModel[AssertionFilterer] with HasSimpleAnnotate[AssertionFilterer] with Licensed

    Filters entities coming from CHUNK annotations.

    Filters entities coming from CHUNK annotations. Filters can be set via a white list of terms or a regular expression. White list criteria is enabled by default. To use regex, criteria has to be set to regex.

    Example

    Filtering POS tags

    First pipeline stages to extract the POS tags are defined

    val data = Seq("Has a past history of gastroenteritis and stomach pain, however patient ...").toDF("text")
    val docAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val sentenceDetector = new SentenceDetector().setInputCols("document").setOutputCol("sentence")
    val tokenizer = new Tokenizer().setInputCols("sentence").setOutputCol("token")
    
    val posTagger = PerceptronModel.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("pos")
    
    val chunker = new Chunker()
      .setInputCols("pos", "sentence")
      .setOutputCol("chunk")
      .setRegexParsers(Array("(<NN>)+"))

    Then the chunks can be filtered via a white list. Here only terms with "gastroenteritis" remain.

    val chunkerFilter = new ChunkFilterer()
      .setInputCols("sentence","chunk")
      .setOutputCol("filtered")
      .setCriteria("isin")
      .setWhiteList("gastroenteritis")
    
    val pipeline = new Pipeline().setStages(Array(
      docAssembler,
      sentenceDetector,
      tokenizer,
      posTagger,
      chunker,
      chunkerFilter))
    
    val result = pipeline.fit(data).transform(data)
    result.selectExpr("explode(chunk)").show(truncate=false)
    +---------------------------------------------------------------------------------+
    |col                                                                              |
    +---------------------------------------------------------------------------------+
    |{chunk, 11, 17, history, {sentence -> 0, chunk -> 0}, []}                        |
    |{chunk, 22, 36, gastroenteritis, {sentence -> 0, chunk -> 1}, []}                |
    |{chunk, 42, 53, stomach pain, {sentence -> 0, chunk -> 2}, []}                   |
    |{chunk, 64, 70, patient, {sentence -> 0, chunk -> 3}, []}                        |
    |{chunk, 81, 110, stomach pain now.We don't care, {sentence -> 0, chunk -> 4}, []}|
    |{chunk, 118, 132, gastroenteritis, {sentence -> 0, chunk -> 5}, []}              |
    +---------------------------------------------------------------------------------+
    
    result.selectExpr("explode(filtered)").show(truncate=false)
    +-------------------------------------------------------------------+
    |col                                                                |
    +-------------------------------------------------------------------+
    |{chunk, 22, 36, gastroenteritis, {sentence -> 0, chunk -> 1}, []}  |
    |{chunk, 118, 132, gastroenteritis, {sentence -> 0, chunk -> 5}, []}|
    +-------------------------------------------------------------------+

Value Members

  1. object AssertionFilterer extends ParamsAndFeaturesReadable[AssertionFilterer] with Serializable
  2. object ChunkFilterer extends ParamsAndFeaturesReadable[ChunkFilterer] with Serializable

Ungrouped