Packages

package disambiguation

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with DisambiguatorModelParams

    Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB).

    Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms. The model needs extracted CHUNKS and SENTENCE_EMBEDDINGS type input from e.g. SentenceEmbeddings and NerConverter.

    Example

    Extracting Person identities

    First define pipeline stages that extract entities and embeddings. Entities are filtered for PER type entities.

    val data = Seq("The show also had a contestant named Donald Trump who later defeated Christina Aguilera ...")
      .toDF("text")
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    val sentenceDetector = new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    val word_embeddings = WordEmbeddingsModel.pretrained()
      .setInputCols("sentence", "token")
      .setOutputCol("embeddings")
    val sentence_embeddings = new SentenceEmbeddings()
      .setInputCols("sentence","embeddings")
      .setOutputCol("sentence_embeddings")
    val ner_model = NerDLModel.pretrained()
      .setInputCols("sentence", "token", "embeddings")
      .setOutputCol("ner")
    val ner_converter = new NerConverter()
      .setInputCols("sentence", "token", "ner")
      .setOutputCol("ner_chunk")
      .setWhiteList("PER")

    Then the extracted entities can be disambiguated.

     val disambiguator = new NerDisambiguator()
      .setS3KnowledgeBaseName("i-per")
      .setInputCols("ner_chunk", "sentence_embeddings")
      .setOutputCol("disambiguation")
      .setNumFirstChars(5)
    
    val nlpPipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      sentence_embeddings,
      ner_model,
      ner_converter,
      disambiguator))
    
    val model = nlpPipeline.fit(data)
    val result = model.transform(data)

    Show results

    result.selectExpr("explode(disambiguation)")
      .selectExpr("col.metadata.chunk as chunk", "col.result as result").show(5, false)
    +------------------+------------------------------------------------------------------------------------------------------------------------+
    |chunk             |result                                                                                                                  |
    +------------------+------------------------------------------------------------------------------------------------------------------------+
    |Donald Trump      |http://en.wikipedia.org/?curid=4848272, http://en.wikipedia.org/?curid=31698421, http://en.wikipedia.org/?curid=55907961|
    |Christina Aguilera|http://en.wikipedia.org/?curid=144171, http://en.wikipedia.org/?curid=6636454                                           |
    +------------------+------------------------------------------------------------------------------------------------------------------------+
  2. class NerDisambiguatorModel extends AnnotatorModel[NerDisambiguatorModel] with AnnotationLogic with PoolingLogicBase with KvKnowledgeExtractor with DisambiguatorModelParams with SwitchableEmbeddingsExtractor with RocksDbReader with HasSimpleAnnotate[NerDisambiguatorModel] with CheckLicense

    Instantiated / pretrained model of the NerDisambiguator.

    Instantiated / pretrained model of the NerDisambiguator. Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms.

    See also

    NerDisambiguator for how to use the model

  3. trait ReadDisambiguator extends ParamsAndFeaturesReadable[NerDisambiguatorModel]
  4. trait ReadPretrainedNerDisambiguator extends ParamsAndFeaturesReadable[NerDisambiguatorModel] with HasPretrained[NerDisambiguatorModel]
  5. class SimpleDisambiguationPipeline extends Serializable

Value Members

  1. object NerDisambiguator extends DefaultParamsReadable[NerDisambiguator] with Serializable
  2. object NerDisambiguatorModel extends ParamsAndFeaturesReadable[NerDisambiguatorModel] with ReadDisambiguator with ReadPretrainedNerDisambiguator with Serializable

Ungrouped