Part of Speech Tagger Pretrained with Clinical Data

Description

A Part of Speech classifier predicts a grammatical label for every token in the input text. Implemented with an averaged perceptron architecture. This model was trained on additional medical data.

Predicted Entities

  • PROPN
  • PUNCT
  • ADJ
  • NOUN
  • VERB
  • DET
  • ADP
  • AUX
  • PRON
  • PART
  • SCONJ
  • NUM
  • ADV
  • CCONJ
  • X
  • INTJ
  • SYM

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler =  new DocumentAssembler().setInputCol("text").setOutputCol("document")
tokenizer          =  new Tokenizer().setInputCols("document").setOutputCol("token")
pos                =  PerceptronModel.pretrained("pos_clinical","en","clinical/models").setInputCols("token","document")
pipeline = Pipeline(stages=[document_assembler, tokenizer, pos])
df = spark.createDataFrame([['POS assigns each token in a sentence a grammatical label']], ["text"])
result = pipeline.fit(df).transform(df)
result.select("pos.result").show(false)
val document_assembler =  new DocumentAssembler().setInputCol("text").setOutputCol("document")
val tokenizer          =  new Tokenizer().setInputCols(Array("document")).setOutputCol("token")
val pos                =  PerceptronModel.pretrained("pos_clinical","en","clinical/models").setInputCols("token","document")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, pos))
val df = Seq("POS assigns each token in a sentence a grammatical label").toDF("text")
val result = pipeline.fit(df).transform(df)
result.select("pos.result").show(false)
nlu.load('pos.clinical').predict("POS assigns each token in a sentence a grammatical label")

Results

+------------------------------------------+
|result                                    |
+------------------------------------------+
|[NN, NNS, PND, NN, II, DD, NN, DD, JJ, NN]|
+------------------------------------------+

Model Information

Model Name: pos_clinical
Compatibility: Spark NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [pos]
Language: en