Email Regex Matcher

Description

This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.

How to use

documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("EMAIL")\

email_regex_matcher_pipeline = Pipeline(
    stages=[
        documentAssembler,
        email_regex_matcher
        ])

data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com."""]]).toDF("text")


email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)

documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

email_regex_matcher = medical.RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("EMAIL")\

email_regex_matcher_pipeline = nlp.Pipeline(
    stages=[
        documentAssembler,
        email_regex_matcher
        ])

data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com."""]]).toDF("text")


email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)

val documentAssembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("EMAIL")

val email_regex_pipeline = new Pipeline().setStages(Array(
		documentAssembler,
		email_regex_matcher
  ))

val data = Seq("""D: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com.""").toDF("text")

val result = email_regex_pipeline.fit(data).transform(data)

Results

| chunk            |   begin |   end | label   |
|:-----------------|--------:|------:|:--------|
| info@domain.net  |      72 |    86 | EMAIL   |
| tech@support.org |      95 |   110 | EMAIL   |
| hale@gmail.com   |     121 |   134 | EMAIL   |
| sales@gmail.com  |     147 |   161 | EMAIL   |

Model Information

Model Name:	email_regex_matcher
Compatibility:	Healthcare NLP 6.2.2+
License:	Licensed
Edition:	Official
Input Labels:	[document]
Output Labels:	[entity_email]
Language:	en
Size:	2.3 KB

PREVIOUSCPT Contextual Parser Model

NEXTURL Regex Matcher