Explain Document pipeline for Korean (explain_document_lg)

Description

The explain_document_lg is a pre-trained pipeline that we can use to process text with a simple pipeline that performs basic processing steps and recognizes entities. It performs most of the common text processing tasks on your dataframe

Download

How to use

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline('explain_document_lg', lang = 'ko')
annotations =  pipeline.fullAnnotate(""안녕하세요, 환영합니다!"")[0]
annotations.keys()

val pipeline = new PretrainedPipeline("explain_document_lg", lang = "ko")
val result = pipeline.fullAnnotate("안녕하세요, 환영합니다!")(0)

Results

+------------------------+--------------------------+--------------------------+--------------------------------+----------------------------+---------------------+
|text                      |document            |sentence              |token                           |ner                           |ner_chunk      |
+------------------------+--------------------------+--------------------------+--------------------------------+----------------------------+---------------------+
|안녕, 존 스노우!|[안녕, 존 스노우!]|[안녕, 존 스노우!]|[안녕, ,, 존, 스노우, !]   |[B-DATE, O, O, O, O]| [안녕]            |
+------------------------+--------------------------+--------------------------+--------------------------------+----------------------------+---------------------+

Model Information

Model Name: explain_document_lg
Type: pipeline
Compatibility: Spark NLP 3.0.2+
License: Open Source
Edition: Official
Language: ko

Included Models

  • DocumentAssembler
  • SentenceDetector
  • WordSegmenterModel
  • WordEmbeddingsModel
  • NerDLModel
  • NerConverter