Packages

package disambiguation

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with KvKnowledgeExtractor with DisambiguatorModelParams

    Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB).

    Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms. The model needs extracted CHUNKS and SENTENCE_EMBEDDINGS type input from e.g. SentenceEmbeddings and NerConverter.

    Example

    Extracting Person identities

    First define pipeline stages that extract entities and embeddings. Entities are filtered for PER type entities.

    val data = Seq("The show also had a contestant named Donald Trump who later defeated Christina Aguilera ...")
      .toDF("text")
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    val word_embeddings = WordEmbeddingsModel.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("embeddings")
    val sentence_embeddings = new SentenceEmbeddings()
      .setInputCols("sentence","embeddings")
      .setOutputCol("sentence_embeddings")
    val ner_model = NerDLModel.pretrained()
      .setInputCols("sentence", "token", "embeddings")
      .setOutputCol("ner")
    val ner_converter = new NerConverter()
      .setInputCols("sentence", "token", "ner")
      .setOutputCol("ner_chunk")
      .setWhiteList("PER")

    Then the extracted entities can be disambiguated.

     val disambiguator = new NerDisambiguator()
      .setS3KnowledgeBaseName("i-per")
      .setInputCols("ner_chunk", "sentence_embeddings")
      .setOutputCol("disambiguation")
      .setNumFirstChars(5)
    
    val nlpPipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      sentence_embeddings,
      ner_model,
      ner_converter,
      disambiguator))
    
    val model = nlpPipeline.fit(data)
    val result = model.transform(data)

    Show results

    result.selectExpr("explode(disambiguation)")
      .selectExpr("col.metadata.chunk as chunk", "col.result as result").show(5, false)
    +------------------+------------------------------------------------------------------------------------------------------------------------+
    |chunk             |result                                                                                                                  |
    +------------------+------------------------------------------------------------------------------------------------------------------------+
    |Donald Trump      |http://en.wikipedia.org/?curid=4848272, http://en.wikipedia.org/?curid=31698421, http://en.wikipedia.org/?curid=55907961|
    |Christina Aguilera|http://en.wikipedia.org/?curid=144171, http://en.wikipedia.org/?curid=6636454                                           |
    +------------------+------------------------------------------------------------------------------------------------------------------------+
  2. class NerDisambiguatorModel extends AnnotatorModel[NerDisambiguatorModel] with AnnotationLogic with PoolingLogicBase with DisambiguatorModelParams with SwitchableEmbeddingsExtractor with HasSimpleAnnotate[NerDisambiguatorModel]

    Instantiated / pretrained model of the NerDisambiguator.

    Instantiated / pretrained model of the NerDisambiguator. Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms.

    See also

    NerDisambiguator for how to use the model

  3. class SimpleDisambiguationPipeline extends Serializable

Value Members

  1. object NerDisambiguator extends DefaultParamsReadable[NerDisambiguator] with Serializable
  2. object NerDisambiguatorModel extends ParamsAndFeaturesReadable[NerDisambiguatorModel] with Serializable

Ungrouped