Explain Document pipeline for Hebrew (explain_document_lg)

Description

The explain_document_lg is a pre-trained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and recognizes entities. It performs most of the common text processing tasks on your dataframe

Download

How to use

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline('explain_document_lg', lang = 'he')
annotations =  pipeline.fullAnnotate(""היי, מעבדות ג'ון סנו!"")[0]
annotations.keys()
val pipeline = new PretrainedPipeline("explain_document_lg", lang = "he")
val result = pipeline.fullAnnotate("היי, מעבדות ג'ון סנו!")(0)

Results

+----------------------+------------------------+----------------------+---------------------------+--------------------+---------+
|                  text|                document|              sentence|                      token|                 ner|ner_chunk|
+----------------------+------------------------+----------------------+---------------------------+--------------------+---------+
| היי ג'ון מעבדות שלג! |[ היי ג'ון מעבדות שלג! ]|[היי ג'ון מעבדות שלג!]|[היי, ג'ון, מעבדות, שלג, !]|[O, B-PERS, O, O, O]|   [ג'ון]|
+----------------------+------------------------+----------------------+---------------------------+--------------------+---------+

Model Information

Model Name: explain_document_lg
Type: pipeline
Compatibility: Spark NLP 3.0.2+
License: Open Source
Edition: Official
Language: he

Included Models

  • DocumentAssembler
  • SentenceDetector
  • TokenizerModel
  • WordEmbeddingsModel
  • NerDLModel
  • NerConverter