Packages

package context

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait ContextRules[T] extends Serializable
  2. class ContextualEntityFilterer extends AnnotatorModel[ContextualEntityFilterer] with HasSimpleAnnotate[ContextualEntityFilterer] with HandleExceptionParams with HasSafeAnnotate[ContextualEntityFilterer] with CheckLicense

    ContextualEntityFilterer can filter chunks coming from CHUNK annotations based on entity(identifier,field) info in metadata.

    ContextualEntityFilterer can filter chunks coming from CHUNK annotations based on entity(identifier,field) info in metadata. Filters can be done via white list entities, black list entities, black list word and white list words. The filter can be applied to the scope of the sentence or the document.

    Example

    Define pipeline stages to extract entities

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetector()
      .setInputCols(Array("document"))
      .setOutputCol("sentences")
    
    val tokenizer = new Tokenizer()
      .setInputCols(Array("sentences"))
      .setOutputCol("tokens")
    
    val embedder = WordEmbeddingsModel
      .pretrained("embeddings_clinical", "en", "clinical/models")
      .setInputCols(Array("sentences", "tokens"))
      .setOutputCol("embeddings")
    
    val nerTagger = MedicalNerModel
      .pretrained("ner_deid_generic_augmented", "en", "clinical/models")
      .setInputCols(Array("sentences", "tokens", "embeddings"))
      .setOutputCol("nerTags")
    
    val nerConverter = new NerConverterInternal()
      .setInputCols(Array("sentences", "tokens", "nerTags"))
      .setOutputCol("nerChunks")

    Define ContextualEntityFilterer and set the rules

    val jsonRules=
         """
             |[{
             | "entity" : "LOCATION",
             | "scopeWindow" : [2,2],
             | "whiteListEntities" : ["AGE","DATE"],
             | "blackListEntities" : ["ID","NAME"],
             | "scopeWindowLevel"  : "token",
             | "blackListWords" : ["beautiful"]
             | },
             | {
             |  "entity" : "DATE",
             |  "scopeWindow" : [2,2],
             |  "whiteListEntities" : ["AGE","DATE"],
             |  "blackListEntities" : ["ID","NAME"],
             |  "scopeWindowLevel"  : "chunk",
             |  "confidenceThreshold" : 0,50
             | }
             | ]
             |
             |""".stripMargin
    
     val contextualEntityFilter = new ContextualEntityFilterer()
       .setInputCols(Array("sentences", "tokens", "nerChunks"))
       .setOutputCol("filtered_chunks")
       .setRulesAsStr(jsonRules)
       .setRuleScope("document")
    
    
    
    val pipeline = new Pipeline().setStages(Array(
         documentAssembler,
         sentenceDetector,
         tokenizer,
         embedder,
         nerTagger,
         nerConverter,
         contextualEntityFilter
       ))
    
     val testText = "California, known for its beautiful beaches,and he is 36 years. " +
         "The Grand Canyon in Arizona, where the age is 37, is a stunning natural landmark. " +
         "It was founded on September 9, 1850, and Arizona on February 14, 1912."
     val testDataSet = Seq(testText).toDS.toDF("text")
    
     val result = pipeline.fit(testDataSet).transform(testDataSet)

    Show results

    result.selectExpr("explode(filtered_chunks) as filtered").show(100,truncate = false)
    
        -----------------+-----+-----+------+
        |result           |begin|begin|entity|
        +-----------------+-----+-----+------+
        |36               |54   |55   |AGE   |
        |37               |110  |111  |AGE   |
        |September 9, 1850|164  |180  |DATE  |
        |February 14, 1912|198  |214  |DATE  |
        +-----------------+-----+-----+------+
  3. case class ContextualFilteringRules(entity: String, scopeWindowLevel: String, whiteListEntities: Option[Array[String]], blackListEntities: Option[Array[String]], scopeWindow: (Int, Int), blackListWords: Option[Array[String]], whiteListWords: Option[Array[String]], confidenceThreshold: Option[Double]) extends Serializable with Product

    ContextualFilteringRules is a case class that represents the rules to filter the Chunks.

    ContextualFilteringRules is a case class that represents the rules to filter the Chunks.

    entity

    The field of the entity to filter.

    scopeWindowLevel

    The level of the scope window. It can be either 'token' or 'chunk'.

    whiteListEntities

    The white list entities to filter.One element of the white list is enough to keep the chunk.

    blackListEntities

    The black list entities to filter.All elements of the black list must be absent to keep the chunk.

    scopeWindow

    The scope window considering chunks to filter. Scope can be calculated looking at tokens or chunks.Decision of chunk or token can be defined by scopeWindowLevel.

    blackListWords

    The black list words to filter. All elements of the black list must be absent to keep the chunk.

    whiteListWords

    The white list words to filter. One element of the white list is enough to keep the chunk.

    confidenceThreshold

    The confidence threshold to filter the chunks. Filtering is only applied if the confidence of the chunk is below the threshold.

  4. class ContextualParserApproach extends AnnotatorApproach[ContextualParserModel] with HandleExceptionParams with CheckLicense

    Creates a model, that extracts entity from a document based on user defined rules.

    Creates a model, that extracts entity from a document based on user defined rules. Rule matching is based on a RegexMatcher defined in a JSON file. It is set through the parameter setJsonPath() In this JSON file, regex is defined that you want to match along with the information that will output on metadata field. Additionally, a dictionary can be provided with setDictionary to map extracted entities to a unified representation. The first column of the dictionary file should be the representation with following columns the possible matches.

    Example

    An example JSON file regex_token.json can look like this:

    {
    "entity": "Stage",
    "ruleScope": "sentence",
    "regex": "[cpyrau]?[T][0-9X?][a-z^cpyrau]*",
    "matchScope": "token"
    }

    Which means to extract the stage code on a sentence level. An example pipeline could then be defined like this

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")

    Define the parser (json file needs to be provided)

    val data = Seq("A patient has liver metastases pT1bN0M0 and the T5 primary site may be colon or... ").toDF("text")
    val contextualParser = new ContextualParserApproach()
      .setInputCols(Array("sentence", "token"))
      .setOutputCol("entity")
      .setJsonPath("/path/to/regex_token.json")
      .setCaseSensitive(true)
    val pipeline = new Pipeline().setStages(Array(
        documentAssembler,
        sentenceDetector,
        tokenizer,
        contextualParser
      ))
    
    val result = pipeline.fit(data).transform(data)

    Show Results

    result.selectExpr("explode(entity)").show(5, truncate=false)
    +-------------------------------------------------------------------------------------------------------------------------+
    |col                                                                                                                      |
    +-------------------------------------------------------------------------------------------------------------------------+
    |{chunk, 32, 39, pT1bN0M0, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 0}, []}                 |
    |{chunk, 49, 50, T5, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 0}, []}                       |
    |{chunk, 148, 156, cT4bcN2M1, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 1}, []}              |
    |{chunk, 189, 194, T?N3M1, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 2}, []}                 |
    |{chunk, 316, 323, pT1bN0M0, {field -> Stage, normalized -> , confidence -> 1.00, sentence -> 3}, []}               |
    +-------------------------------------------------------------------------------------------------------------------------+
    See also

    ContextualParserModel for the trained model

  5. class ContextualParserModel extends AnnotatorModel[ContextualParserModel] with HasSimpleAnnotate[ContextualParserModel] with HandleExceptionParams with HasSafeAnnotate[ContextualParserModel] with CheckLicense

    Extracts entity from a document based on user defined rules.

    Extracts entity from a document based on user defined rules. Rule matching is based on a RegexMatcher defined in a JSON file. In this file, regex is defined that you want to match along with the information that will output on metadata field. To instantiate a model, see ContextualParserApproach and its accompanied example.

    See also

    ContextualParserApproach to create your own model

  6. case class Dictionary(dictionary: Map[String, String]) extends Product with Serializable
    Attributes
    protected
  7. case class EntityDefinition(entity: String, ruleScope: String, regex: Option[String], contextLength: Option[Double], prefix: Option[List[String]], regexPrefix: Option[String], suffix: Option[List[String]], regexSuffix: Option[String], contextException: Option[List[String]], exceptionDistance: Option[Double], regexContextException: Option[String], matchScope: Option[String], completeMatchRegex: Option[String]) extends Product with Serializable
    Attributes
    protected
  8. class MatchExceptions extends ContextRules[Boolean] with Serializable
  9. class MatchPrefixSuffix extends ContextRules[(Boolean, Map[String, Double])] with Serializable
  10. class MatchRegex extends ContextRules[(Boolean, Map[String, Double])] with Serializable
  11. class MatchRegexPerSentence extends ContextRules[List[(Boolean, Map[String, Double])]] with Serializable
  12. case class MatchedToken(token: String, begin: Int, end: Int, valueMatch: String, regexMatch: String, sentenceIndex: Int, confidenceValue: Double, normalizedValue: String, tokenIndex: Int) extends Product with Serializable
  13. trait ReadablePretrainedContextualParser extends ParamsAndFeaturesReadable[ContextualParserModel] with HasPretrained[ContextualParserModel]

Value Members

  1. object ContextualEntityFilterer extends ParamsAndFeaturesReadable[ContextualEntityFilterer] with Serializable
  2. object ContextualFilteringRules extends Serializable

    Companion object for ContextualFilteringRules.

  3. object ContextualParserApproach extends DefaultParamsReadable[ContextualParserApproach] with Serializable
  4. object ContextualParserModel extends ReadablePretrainedContextualParser with Serializable

Ungrouped