Typed Dependency Parsing for English

Description

Typed Dependency parser, trained on the on the CONLL dataset.

Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical structure and defines the relationships between “head” words and words, which modify those heads.

Example:

 root
  |
  | +-------dobj---------+
  | |                    | nsubj | |   +------det-----+ | +-----nmod------+ +--+  | |   |              | | |               | |  |  | |   |      +-nmod-+| | |      +-case-+ | +  |  + |   +      +      || + |      +      | |

I prefer the morning flight through Denver Relations among the words are illustrated above the sentence with directed, labeled arcs from heads to dependents (+ indicates the dependent).

Live Demo Open in Colab Download

How to use

from sparknlp.annotators import *

documentAssembler     = DocumentAssembler().setInputCol("text").setOutputCol("document")
sentenceDetector      = SentenceDetector().setInputCols(["document"]).setOutputCol("sentence")
tokenizer             = Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
posTagger             = PerceptronModel.pretrained().setInputCols(["token", "sentence"]).setOutputCol("pos")
dependencyParser      = DependencyParserModel.pretrained().setInputCols(["sentence", "pos", "token"]).setOutputCol("dependency")
typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(["token", "pos", "dependency"]).setOutputCol("labdep")
pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser])
data = spark.createDataFrame({"text": "Dependencies represents relationships betweens words in a Sentence"})
# Create data frame
df = spark.createDataFrame(data)
result = pipeline.fit(df).transform(df)
result.select("dependency.result", "labdep.result").show(false)



import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val documentAssembler     = new DocumentAssembler().setInputCol("text").setOutputCol("document")
val sentenceDetector      = new SentenceDetector().setInputCols(Array("document")).setOutputCol("sentence")
val tokenizer             = new Tokenizer().setInputCols(Array("sentence")).setOutputCol("token")
val posTagger             = PerceptronModel.pretrained().setInputCols(Array("token", "sentence")).setOutputCol("pos")
val dependencyParser      = DependencyParserModel.pretrained().setInputCols(Array("sentence", "pos", "token")).setOutputCol("dependency")
val typedDependencyParser = TypedDependencyParserModel.pretrained().setInputCols(Array("token", "pos", "dependency")).setOutputCol("labdep")
val pipeline              = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, posTagger, dependencyParser, typedDependencyParser))
val df = Seq("Dependencies represents relationships betweens words in a Sentence").toDF("text")
val result = pipeline.fit(df).transform(df)
result.select("dependency.result", "labdep.result").show(false)

nlu.load("dep.typed").predict("Dependencies represents relationships betweens words in a Sentence")

Results

+---------------------------------------------------------------------------------+--------------------------------------------------------+
|result                                                                           |result                                                  |
+---------------------------------------------------------------------------------+--------------------------------------------------------+
|[ROOT, Dependencies, represents, words, relationships, Sentence, Sentence, words]|[root, parataxis, nsubj, amod, nsubj, case, nsubj, flat]|
+---------------------------------------------------------------------------------+--------------------------------------------------------+

Model Information

Model Name: dependency_typed_conllu
Compatibility: Spark NLP 3.0.0+
License: Open Source
Edition: Official
Input Labels: [token, pos, dep_root]
Output Labels: [dep_mod]
Language: en

Data Source

CONLL