Description
Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in your input text. It’s based on Levenshtein Automaton for generating candidate corrections and a Neural Language Model for ranking corrections. The model is trained for PySpark 2.4.x users with SparkNLP 3.4.2 and above.
Predicted Entities
Live Demo Open in Colab Download Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = RecursiveTokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")\
.setPrefixes(["\"", "“", "(", "[", "\n", "."]) \
.setSuffixes(["\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"])
spellModel = ContextSpellCheckerModel\
.pretrained("spellcheck_dl", "en")\
.setInputCols("token")\
.setOutputCol("checked")\
pipeline = Pipeline(stages = [documentAssembler, tokenizer, spellModel])
empty_df = spark.createDataFrame([[""]]).toDF("text")
lp = LightPipeline(pipeline.fit(empty_df))
text = ["During the summer we have the best ueather.", "I have a black ueather jacket, so nice."]
lp.annotate(text)
val assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new RecursiveTokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
.setPrefixes(Array("\"", "“", "(", "[", "\n", "."))
.setSuffixes(Array("\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"))
val spellChecker = ContextSpellCheckerModel.
pretrained("spellcheck_dl", "en").
setInputCols("token").
setOutputCol("checked")
val pipeline = new Pipeline().setStages(Array(assembler, tokenizer, spellChecker))
val empty_df = spark.createDataFrame([[""]]).toDF("text")
val lp = new LightPipeline(pipeline.fit(empty_df))
val text = Array("During the summer we have the best ueather.", "I have a black ueather jacket, so nice.")
lp.annotate(text)
import nlu
nlu.load("spell").predict("""During the summer we have the best ueather.""")
Results
[{'checked': ['During', 'the', 'summer', 'we', 'have', 'the', 'best', 'weather', '.'],
'document': ['During the summer we have the best ueather.'],
'token': ['During', 'the', 'summer', 'we', 'have', 'the', 'best', 'ueather', '.']},
{'checked': ['I', 'have', 'a', 'black', 'leather', 'jacket', ',', 'so', 'nice', '.'],
'document': ['I have a black ueather jacket, so nice.'],
'token': ['I', 'have', 'a', 'black', 'ueather', 'jacket', ',', 'so', 'nice', '.']}]
Model Information
Model Name: | spellcheck_dl |
Compatibility: | Spark NLP 3.4.2+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [token] |
Output Labels: | [corrected] |
Language: | en |
Size: | 99.4 MB |
References
Combination of custom public data sets.