Description
This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("EMAIL")\
email_regex_matcher_pipeline = Pipeline(
stages=[
documentAssembler,
email_regex_matcher
])
data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com."""]]).toDF("text")
email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
email_regex_matcher = medical.RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("EMAIL")\
email_regex_matcher_pipeline = nlp.Pipeline(
stages=[
documentAssembler,
email_regex_matcher
])
data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com."""]]).toDF("text")
email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("EMAIL")
val email_regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
email_regex_matcher
))
val data = Seq("""D: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com.""").toDF("text")
val result = email_regex_pipeline.fit(data).transform(data)
Results
| chunk | begin | end | label |
|:-----------------|--------:|------:|:--------|
| info@domain.net | 72 | 86 | EMAIL |
| tech@support.org | 95 | 110 | EMAIL |
| hale@gmail.com | 121 | 134 | EMAIL |
| sales@gmail.com | 147 | 161 | EMAIL |
Model Information
| Model Name: | email_regex_matcher |
| Compatibility: | Healthcare NLP 6.2.2+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document] |
| Output Labels: | [entity_email] |
| Language: | en |
| Size: | 2.3 KB |
PREVIOUSCPT Contextual Parser Model