Medical Spell Checker

Description

Contextual Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in your medical input text. It’s based on Levenshtein Automation for generating candidate corrections and a Neural Language Model for ranking corrections. This model has been trained in a dataset containing data from different sources; MTSamples, augmented version of i2b2 clinical notes, and several specific medical corpora. You can download the model that comes fully pretrained and ready to use. However, you can still customize it further without the need for re-training a new model from scratch. This can be accomplished by providing custom definitions for the word classes the model has been trained on, namely Dates, Numbers, Ages, Units, and Medications.

Predicted Entities

Live Demo Open in Colab Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
	  .setInputCol("text")\
	  .setOutputCol("document")

tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")\
.setContextChars(["*", "-", "“", "(", "[", "\n", ".","\"", "”", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"])

spellModel = ContextSpellCheckerModel\
	  .pretrained('spellcheck_clinical', 'en', 'clinical/models')\
	  .setInputCols("token")\
	  .setOutputCol("checked")

pipeline = Pipeline(stages = [
			documentAssembler, 
			tokenizer, 
			spellModel])

light_pipeline = LightPipeline(pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))

example = ["Witth the hell of phisical terapy the patient was imbulated and on postoperative, the impatient tolerating a post curgical soft diet.",
"With paint wel controlled on orall pain medications, she was discharged too reihabilitation facilitay.",
"Abdomen is sort, nontender, and nonintended.",
"Patient not showing pain or any wealth problems.",
"No cute distress"]

result = light_pipeline.annotate(example)

val assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
.setContextChars(Array("*", "-", "“", "(", "[", "\n", ".","\"", "”", ",", "?", ")", "]", "!", ";", ":", "'s", "’s"))

val spellChecker = ContextSpellCheckerModel.pretrained("spellcheck_clinical", "en", "clinical/models")
.setInputCols("token")
.setOutputCol("checked")

val pipeline =  new Pipeline().setStages(Array(
					 assembler, 
					 tokenizer, 
					 spellChecker))

val light_pipeline = new LightPipeline(pipeline.fit(Seq("").toDS.toDF("text")))

val text = Array("Witth the hell of phisical terapy the patient was imbulated and on postoperative, the impatient tolerating a post curgical soft diet.",
"With paint wel controlled on orall pain medications, she was discharged too reihabilitation facilitay.",
"Abdomen is sort, nontender, and nonintended.",
"Patient not showing pain or any wealth problems.",
"No cute distress")

val result = light_pipeline.annotate(text)

import nlu
nlu.load("en.spell.clinical").predict(""")

pipeline = Pipeline(stages = [
			documentAssembler, 
			tokenizer, 
			spellModel])

light_pipeline = LightPipeline(pipeline.fit(spark.createDataFrame([[""")

Results

[{'checked': ['With','the','cell','of','physical','therapy','the','patient','was','ambulated','and','on','postoperative',',','the','patient','tolerating','a','post','surgical','soft','diet','.'],
'document': ['Witth the hell of phisical terapy the patient was imbulated and on postoperative, the impatient tolerating a post curgical soft diet.'],
'token': ['Witth','the','hell','of','phisical','terapy','the','patient','was','imbulated','and','on','postoperative',',','the','impatient','tolerating','a','post','curgical','soft','diet','.']},

{'checked': ['With','pain','well','controlled','on','oral','pain','medications',',','she','was','discharged','to','rehabilitation','facility','.'],
'document': ['With paint wel controlled on orall pain medications, she was discharged too reihabilitation facilitay.'],
'token': ['With','paint','wel','controlled','on','orall','pain','medications',',','she','was','discharged','too','reihabilitation','facilitay','.']},

{'checked': ['Abdomen','is','soft',',','nontender',',','and','nondistended','.'],
'document': ['Abdomen is sort, nontender, and nonintended.'],
'token': ['Abdomen','is','sort',',','nontender',',','and','nonintended','.']},

{'checked': ['Patient','not','showing','pain','or','any','health','problems','.'],
'document': ['Patient not showing pain or any wealth problems.'],
'token': ['Patient','not','showing','pain','or','any','wealth','problems','.']},

{'checked': ['No', 'acute', 'distress'],
'document': ['No cute distress'],
'token': ['No', 'cute', 'distress']}]

Model Information

Model Name:	spellcheck_clinical
Compatibility:	Healthcare NLP 3.4.1+
License:	Licensed
Edition:	Official
Input Labels:	[token]
Output Labels:	[corrected]
Language:	en
Size:	141.2 MB

References

MTSamples, augmented version of i2b2 clinical notes, and several specific medical corpora.

PREVIOUSZero-shot Relation Extraction (BioBert)

NEXTDetect PHI for Deidentification purposes (Portuguese, reduced entities)