package disambiguation
- Alphabetic
- Public
- All
Type Members
-
class
NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with DisambiguatorModelParams
Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB).
Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms. The model needs extracted CHUNKS and SENTENCE_EMBEDDINGS type input from e.g. SentenceEmbeddings and NerConverter.
Example
Extracting Person identities
First define pipeline stages that extract entities and embeddings. Entities are filtered for PER type entities.
val data = Seq("The show also had a contestant named Donald Trump who later defeated Christina Aguilera ...") .toDF("text") val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val word_embeddings = WordEmbeddingsModel.pretrained() .setInputCols("sentence", "token") .setOutputCol("embeddings") val sentence_embeddings = new SentenceEmbeddings() .setInputCols("sentence","embeddings") .setOutputCol("sentence_embeddings") val ner_model = NerDLModel.pretrained() .setInputCols("sentence", "token", "embeddings") .setOutputCol("ner") val ner_converter = new NerConverter() .setInputCols("sentence", "token", "ner") .setOutputCol("ner_chunk") .setWhiteList("PER")
Then the extracted entities can be disambiguated.
val disambiguator = new NerDisambiguator() .setS3KnowledgeBaseName("i-per") .setInputCols("ner_chunk", "sentence_embeddings") .setOutputCol("disambiguation") .setNumFirstChars(5) val nlpPipeline = new Pipeline().setStages(Array( documentAssembler, sentenceDetector, tokenizer, word_embeddings, sentence_embeddings, ner_model, ner_converter, disambiguator)) val model = nlpPipeline.fit(data) val result = model.transform(data)
Show results
result.selectExpr("explode(disambiguation)") .selectExpr("col.metadata.chunk as chunk", "col.result as result").show(5, false) +------------------+------------------------------------------------------------------------------------------------------------------------+ |chunk |result | +------------------+------------------------------------------------------------------------------------------------------------------------+ |Donald Trump |http://en.wikipedia.org/?curid=4848272, http://en.wikipedia.org/?curid=31698421, http://en.wikipedia.org/?curid=55907961| |Christina Aguilera|http://en.wikipedia.org/?curid=144171, http://en.wikipedia.org/?curid=6636454 | +------------------+------------------------------------------------------------------------------------------------------------------------+
-
class
NerDisambiguatorModel extends AnnotatorModel[NerDisambiguatorModel] with AnnotationLogic with PoolingLogicBase with KvKnowledgeExtractor with DisambiguatorModelParams with SwitchableEmbeddingsExtractor with RocksDbReader with HasSimpleAnnotate[NerDisambiguatorModel] with CheckLicense
Instantiated / pretrained model of the NerDisambiguator.
Instantiated / pretrained model of the NerDisambiguator. Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms.
- See also
NerDisambiguator for how to use the model
- trait ReadDisambiguator extends ParamsAndFeaturesReadable[NerDisambiguatorModel]
- trait ReadPretrainedNerDisambiguator extends ParamsAndFeaturesReadable[NerDisambiguatorModel] with HasPretrained[NerDisambiguatorModel]
- class SimpleDisambiguationPipeline extends Serializable
Value Members
- object NerDisambiguator extends DefaultParamsReadable[NerDisambiguator] with Serializable
- object NerDisambiguatorModel extends ParamsAndFeaturesReadable[NerDisambiguatorModel] with ReadDisambiguator with ReadPretrainedNerDisambiguator with Serializable