State Text Matcher

Description

This model extracts US state entities in clinical notes using a rule-based TextMatcherInternal annotator.

Predicted Entities

STATE

Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

text_matcher = TextMatcherInternalModel.pretrained("state_matcher","en","clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("state_name")\
    .setMergeOverlapping(True)

mathcer_pipeline = Pipeline().setStages([
    documentAssembler,
    sentenceDetector,
    tokenizer,
    text_matcher])

data = spark.createDataFrame([["""Dr. Sarah Mitchell, a renowned oncologist from California, treated a patient diagnosed with lung cancer. The patient, originally from Texas, had traveled across several states seeking specialized care. After consulting with various doctors in Nevada and Arizona, the decision was made to transfer the patient to California for advanced treatment options. During a conference in New York, Dr. Mitchell presented the case to her colleagues from Florida and Illinois, discussing the innovative techniques used in the surgery. The patient’s recovery has been closely monitored, with follow-up appointments scheduled in both  California and Texas to ensure continued care and support."""]]).toDF("text")

result = mathcer_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("sentence")

val tokenizer = new Tokenizer()
	.setInputCols(Array("sentence"))
	.setOutputCol("token")

val text_matcher = TextMatcherInternalModel.pretrained("state_matcher","en","clinical/models")
	.setInputCols(Array("sentence","token"))
	.setOutputCol("state_name")
	.setMergeOverlapping(true)

val mathcer_pipeline = new Pipeline().setStages(Array(
		documentAssembler,
		sentenceDetector,
		tokenizer,
		text_matcher))

val data = Seq("""Dr. Sarah Mitchell, a renowned oncologist from California, treated a patient diagnosed with lung cancer. The patient, originally from Texas, had traveled across several states seeking specialized care. After consulting with various doctors in Nevada and Arizona, the decision was made to transfer the patient to California for advanced treatment options. During a conference in New York, Dr. Mitchell presented the case to her colleagues from Florida and Illinois, discussing the innovative techniques used in the surgery. The patient’s recovery has been closely monitored, with follow-up appointments scheduled in both  California and Texas to ensure continued care and support.""").toDF("text")

val result = mathcer_pipeline.fit(data).transform(data)

Results

+----------+-----+---+-----+
|chunk     |begin|end|label|
+----------+-----+---+-----+
|California|47   |56 |STATE|
|Texas     |134  |138|STATE|
|Nevada    |243  |248|STATE|
|Arizona   |254  |260|STATE|
|California|312  |321|STATE|
|New York  |378  |385|STATE|
|Florida   |443  |449|STATE|
|Illinois  |455  |462|STATE|
|California|621  |630|STATE|
|Texas     |636  |640|STATE|
+----------+-----+---+-----+

Model Information

Model Name: state_matcher
Compatibility: Healthcare NLP 5.4.1+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [entity_state]
Language: en
Size: 6.3 KB
Case sensitive: true