Description
Contextual Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in your input text. It’s based on Levenshtein Automaton for generating candidate corrections and a Neural Language Model for ranking corrections. This model has been trained in a dataset containing data from different sources; MTSamples, i2b2 clinical notes, and PubMed. You can download the model that comes fully pretrained and ready to use. However, you can still customize it further without the need for re-training a new model from scratch. This can be accomplished by providing custom definitions for the word classes the model has been trained on, namely Dates, Numbers, Ages, Units, and Medications.
How to use
In order to use this model, you need to setup a pipeline and feed tokens.
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = RecursiveTokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")\
.setPrefixes(["\"", "(", "[", "\n"])\
.setSuffixes([".", ",", "?", ")","!", "'s"])
spellModel = ContextSpellCheckerModel\
.pretrained('spellcheck_clinical', 'en', 'clinical/models')\
.setInputCols("token")\
.setOutputCol("checked")
finisher = Finisher()\
.setInputCols("checked")
pipeline = Pipeline(
stages = [
documentAssembler,
tokenizer,
spellModel,
finisher
])
empty_ds = spark.createDataFrame([[""]]).toDF("text")
lp = LightPipeline(pipeline.fit(empty_ds))
example = ["Witth the hell of phisical terapy the patient was imbulated and on posoperative, the impatient tolerating a post curgical soft diet.",
"With paint wel controlled on orall pain medications, she was discharged too reihabilitation facilitay.",
"She is to also call the ofice if she has any ever greater than 101, or leeding form the surgical wounds.",
"Abdomen is sort, nontender, and nonintended.",
"Patient not showing pain or any wealth problems.",
"No cute distress"
]
lp.annotate(example)
import nlu
nlu.load("en.spell.clinical").predict(""")
pipeline = Pipeline(
stages = [
documentAssembler,
tokenizer,
spellModel,
finisher
])
empty_ds = spark.createDataFrame([[""")
Results
[{'checked': ['With',
'the',
'help',
'of',
'physical',
'therapy',
'the',
'patient',
'was',
'ambulated',
'and',
'on',
'postoperative',
',',
'the',
'patient',
'tolerating',
'a',
'post',
'surgical',
'soft',
'diet',
'.']},
{'checked': ['With',
'pain',
'well',
'controlled',
'on',
'oral',
'pain',
'medications',
',',
'she',
'was',
'discharged',
'to',
'rehabilitation',
'facility',
'.']},
{'checked': ['She',
'is',
'to',
'also',
'call',
'the',
'office',
'if',
'she',
'has',
'any',
'fever',
'greater',
'than',
'101',
',',
'or',
'bleeding',
'from',
'the',
'surgical',
'wounds',
'.']},
{'checked': ['Abdomen',
'is',
'soft',
',',
'nontender',
',',
'and',
'nondistended',
'.']},
{'checked': ['Patient',
'not',
'showing',
'pain',
'or',
'any',
'health',
'problems',
'.']},
{'checked': ['No', 'acute', 'distress']}]
Model Information
Model Name: | spellcheck_clinical |
Compatibility: | Spark NLP 2.7.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [token] |
Language: | en |
Data Source
MTSamples, augmented version of i2b2 clinical notes, and PubMed.