Packages

package context

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class ContextualParserApproach extends AnnotatorApproach[ContextualParserModel] with HasRecursiveFit[ContextualParserModel] with CheckLicense

    Creates a model, that extracts entity from a document based on user defined rules.

    Creates a model, that extracts entity from a document based on user defined rules. Rule matching is based on a RegexMatcher defined in a JSON file. It is set through the parameter setJsonPath() In this JSON file, regex is defined that you want to match along with the information that will output on metadata field. Additionally, a dictionary can be provided with setDictionary to map extracted entities to a unified representation. The first column of the dictionary file should be the representation with following columns the possible matches.

    Example

    An example JSON file regex_token.json can look like this:

    {
      "entity": "Stage",
      "ruleScope": "sentence",
      "regex": "[cpyrau]?[T][0-9X?][a-z^cpyrau]*",
      "matchScope": "token"
    }

    Which means to extract the stage code on a sentence level. An example pipeline could then be defined like this

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")

    Define the parser (json file needs to be provided)

    val data = Seq("A patient has liver metastases pT1bN0M0 and the T5 primary site may be colon or... ").toDF("text")
    val contextualParser = new ContextualParserApproach()
      .setInputCols(Array("sentence", "token"))
      .setOutputCol("entity")
      .setJsonPath("/path/to/regex_token.json")
      .setCaseSensitive(true)
      .setContextMatch(false)
    val pipeline = new Pipeline().setStages(Array(
        documentAssembler,
        sentenceDetector,
        tokenizer,
        contextualParser
      ))
    
    val result = pipeline.fit(data).transform(data)

    Show Results

    result.selectExpr("explode(entity)").show(5, truncate=false)
    +-------------------------------------------------------------------------------------------------------------------------+
    |col                                                                                                                      |
    +-------------------------------------------------------------------------------------------------------------------------+
    |{chunk, 32, 39, pT1bN0M0, {field -> Stage, normalized -> , confidenceValue -> 0.13, hits -> regex, sentence -> 0}, []}   |
    |{chunk, 49, 50, T5, {field -> Stage, normalized -> , confidenceValue -> 0.13, hits -> regex, sentence -> 0}, []}         |
    |{chunk, 148, 156, cT4bcN2M1, {field -> Stage, normalized -> , confidenceValue -> 0.13, hits -> regex, sentence -> 1}, []}|
    |{chunk, 189, 194, T?N3M1, {field -> Stage, normalized -> , confidenceValue -> 0.13, hits -> regex, sentence -> 2}, []}   |
    |{chunk, 316, 323, pT1bN0M0, {field -> Stage, normalized -> , confidenceValue -> 0.13, hits -> regex, sentence -> 3}, []} |
    +-------------------------------------------------------------------------------------------------------------------------+
    See also

    ContextualParserModel for the trained model

  2. class ContextualParserModel extends AnnotatorModel[ContextualParserModel] with HasSimpleAnnotate[ContextualParserModel] with CheckLicense

    Extracts entity from a document based on user defined rules.

    Extracts entity from a document based on user defined rules. Rule matching is based on a RegexMatcher defined in a JSON file. In this file, regex is defined that you want to match along with the information that will output on metadata field. To instantiate a model, see ContextualParserApproach and its accompanied example.

    See also

    ContextualParserApproach to create your own model

  3. case class Dictionary(key: String, values: List[String], regex: String) extends Product with Serializable
    Attributes
    protected
  4. case class EntityDefinition(entity: String, ruleScope: String, regex: Option[String], contextLength: Option[Double], prefix: Option[List[String]], regexPrefix: Option[String], suffix: Option[List[String]], regexSuffix: Option[String], context: Option[List[String]], contextException: Option[List[String]], exceptionDistance: Option[Double], regexContextException: Option[String], matchScope: Option[String], completeMatchRegex: Option[String]) extends Product with Serializable
    Attributes
    protected

Value Members

  1. object ContextualParserApproach extends DefaultParamsReadable[ContextualParserApproach] with Serializable
  2. object ContextualParserModel extends ParamsAndFeaturesReadable[ContextualParserModel] with Serializable

Ungrouped