Part of Speech for Bengali

Description

A Part of Speech classifier predicts a grammatical label for every token in the input text. Implemented with an averaged perceptron architecture.

Predicted Entities

  • NN
  • SYM
  • NNP
  • VM
  • INTF
  • JJ
  • QF
  • CC
  • NST
  • PSP
  • QC
  • DEM
  • RDP
  • PRP
  • NEG
  • WQ
  • RB
  • VAUX
  • UT
  • XC
  • RP
  • QO
  • BM
  • NNC
  • PPR
  • INJ
  • CL
  • UNK

Live Demo Open in Colab Download

How to use


document_assembler = DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

sentence_detector = SentenceDetector()
  .setInputCols(["document"])
  .setOutputCol("sentence")

pos = PerceptronModel.pretrained("pos_msri", "bn")
  .setInputCols(["document", "token"])
  .setOutputCol("pos")

pipeline = Pipeline(stages=[
  document_assembler,
  sentence_detector,
  posTagger
])

example = spark.createDataFrame(pd.DataFrame({'text': ["জন স্নো ল্যাবস থেকে হ্যালো! "]}))

result = pipeline.fit(example).transform(example)



val document_assembler = DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")

val sentence_detector = SentenceDetector()
        .setInputCols(["document"])
.setOutputCol("sentence")

val pos = PerceptronModel.pretrained("pos_msri", "bn")
        .setInputCols(Array("document", "token"))
        .setOutputCol("pos")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, pos))

val result = pipeline.fit(Seq.empty["জন  বস  ! "].toDS.toDF("text")).transform(data)


import nlu
text = [""জন নো যাবস থেকে যালো! ""]
token_df = nlu.load('bn.pos').predict(text)
token_df
    

Results

    token  pos
              
0      জন   NN
1    স্নো   NN
2  ল্যাবস   NN
3    থেকে  PSP
4  হ্যালো   JJ
5       !  SYM

Model Information

Model Name: pos_msri
Compatibility: Spark NLP 3.0.0+
License: Open Source
Edition: Official
Input Labels: [document, token]
Output Labels: [pos]
Language: bn