Explain Document DL

Description

The explain_document_dl is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps.

Open in Colab Download

How to use


pipeline = PretrainedPipeline('explain_document_dl', lang = 'en')

annotations =  pipeline.fullAnnotate("""French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.""")[0]

annotations.keys()


val pipeline = new PretrainedPipeline('explain_document_dl', lang = 'en')

val result = pipeline.fullAnnotate("French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.")(0)

Results

+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|                text|            document|            sentence|               token|               spell|              lemmas|               stems|                 pos|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|French author who...|[[document, 0, 23...|[[document, 0, 57...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, fr...|[[pos, 0, 5, JJ, ...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+

Model Information

Model Name: explain_document_dl
Type: pipeline
Compatibility: Spark NLP 2.5.5+
License: Open Source
Edition: Community
Language: [en]

Included Models

The explain_document_dl has one Transformer and six annotators:

  • Documenssembler - A Transformer that creates a column that contains documents.
  • Sentence Segmenter - An annotator that produces the sentences of the document.
  • Tokenizer - An annotator that produces the tokens of the sentences.
  • SpellChecker - An annotator that produces the spelling-corrected tokens.
  • Stemmer - An annotator that produces the stems of the tokens.
  • Lemmatizer - An annotator that produces the lemmas of the tokens.
  • POS Tagger - An annotator that produces the parts of speech of the associated tokens.