Stopwords Remover for Tigrinya language (182 entries)


This is a scalable, production-ready Stopwords Remover model trained using the corpus available at stopwords-iso.


How to use

documentAssembler = DocumentAssembler() \
    .setInputCol("text") \

tokenizer = Tokenizer() \
    .setInputCols(["document"]) \

stop_words = StopWordsCleaner.pretrained("stopwords_iso","ti") \
    .setInputCols(["token"]) \

pipeline = Pipeline(stages=[documentAssembler, tokenizer, stop_words]) 

example = spark.createDataFrame([["ይቕሬታ፣ ሓደ መደብ ገይረ ኣሎኹ።"]], ["text"]) 

results =
val documentAssembler = new DocumentAssembler() 

val stop_words = new Tokenizer() 

val lemmatizer = StopWordsCleaner.pretrained("stopwords_iso","ti") 

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, stop_words))
val data = Seq("ይቕሬታ፣ ሓደ መደብ ገይረ ኣሎኹ።").toDF("text")
val results =


|result                     |
|[ይቕሬታ፣, ሓደ, መደብ, ገይረ, ኣሎኹ።]|

Model Information

Model Name: stopwords_iso
Compatibility: Spark NLP 3.4.1+
License: Open Source
Edition: Official
Input Labels: [token]
Output Labels: [cleanTokens]
Language: ti
Size: 2.1 KB