Parts of Speech Tagger pretraiend of clinical data


A Part of Speech classifier predicts a grammatical label for every token in the input text. Implemented with an averaged perceptron architecture. This model was trained on additional medical data.

Predicted Entities

Predicted Entities

  • ADJ
  • NOUN
  • VERB
  • DET
  • ADP
  • AUX
  • PRON
  • PART
  • NUM
  • ADV
  • X
  • INTJ
  • SYM

Live Demo Open in Colab Download

How to use

    document_assembler =  new DocumentAssembler().setInputCol("text").setOutputCol("document")
    tokenizer          =  new Tokenizer().setInputCols("document").setOutputCol("token")
    pos                =  PerceptronModel.pretrained("pos_clinical","en","clinical/models").setInputCols("token","document")
    pipeline = Pipeline(stages=[document_assembler, tokenizer, pos])
    df = spark.createDataFrame(pd.DataFrame({'text': ["POS assigns each token in a sentence a grammatical label"]}))
    result ="pos.result").show(false)
    val document_assembler =  new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val tokenizer          =  new Tokenizer().setInputCols(Array("document")).setOutputCol("token")
    val pos                =  PerceptronModel.pretrained("pos_clinical","en","clinical/models").setInputCols("token","document")
    val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, pos))
    val df = Seq("POS assigns each token in a sentence a grammatical label").toDF("text")
    val result ="pos.result").show(false)
nlu.load('pos.clinical').predict("POS assigns each token in a sentence a grammatical label")


|result                                    |
|[NN, NNS, PND, NN, II, DD, NN, DD, JJ, NN]|

Model Information

Model Name: pos_clinical
Compatibility: Spark NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [pos]
Language: en