Description
Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in your input text. It’s based on Levenshtein Automaton for generating candidate corrections and a Neural Language Model for ranking corrections.
How to use
The model works at the token level, so you must put it after tokenization. The model can change the length of the tokens when correcting words, so keep this in mind when using it before other annotators that may work with absolute references to the original document like NerConverter.
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = RecursiveTokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")\
.setPrefixes(["\"", "“", "(", "[", "\n", "."]) \
.setSuffixes(["\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"])
spellModel = ContextSpellCheckerModel\
.pretrained()\
.setInputCols("token")\
.setOutputCol("checked")\
val assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new RecursiveTokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
.setPrefixes(Array("\"", "“", "(", "[", "\n", "."))
.setSuffixes(Array("\"", "”", ".", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"))
val spellChecker = ContextSpellCheckerModel.
pretrained().
setInputCols("token").
setOutputCol("checked")
import nlu
nlu.load("en.spell.clinical").predict("""]) \
.setSuffixes([""")
Model Information
Model Name: | spellcheck_clinical |
Compatibility: | Spark NLP 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [token] |
Language: | en |
Data Source
The dataset used contains data drawn from MT Samples clinical notes, augmented version of i2b2 clinical notes, and PubMed abstracts.