Description
The explain_document_dl
is a pretrained pipeline that we can use to process text with a simple pipeline that performs basic processing steps.
How to use
pipeline = PretrainedPipeline('explain_document_dl', lang = 'en')
annotations = pipeline.fullAnnotate("""French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.""")[0]
annotations.keys()
val pipeline = new PretrainedPipeline('explain_document_dl', lang = 'en')
val result = pipeline.fullAnnotate("French author who helped pioner the science-fiction genre. Verne wrate about space, air, and underwater travel before navigable aircrast and practical submarines were invented, and before any means of space travel had been devised.")(0)
Results
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| text| document| sentence| token| spell| lemmas| stems| pos|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|French author who...|[[document, 0, 23...|[[document, 0, 57...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, Fr...|[[token, 0, 5, fr...|[[pos, 0, 5, JJ, ...|
+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
Model Information
Model Name: | explain_document_dl |
Type: | pipeline |
Compatibility: | Spark NLP 2.5.5+ |
License: | Open Source |
Edition: | Community |
Language: | [en] |
Included Models
The explain_document_dl has one Transformer and six annotators:
- Documenssembler - A Transformer that creates a column that contains documents.
- Sentence Segmenter - An annotator that produces the sentences of the document.
- Tokenizer - An annotator that produces the tokens of the sentences.
- SpellChecker - An annotator that produces the spelling-corrected tokens.
- Stemmer - An annotator that produces the stems of the tokens.
- Lemmatizer - An annotator that produces the lemmas of the tokens.
- POS Tagger - An annotator that produces the parts of speech of the associated tokens.