Email Regex Matcher

Description

This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.

Copy S3 URI

How to use


documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("EMAIL")\

email_regex_matcher_pipeline = Pipeline(
    stages=[
        documentAssembler,
        email_regex_matcher
        ])

data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com."""]]).toDF("text")


email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)



documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

email_regex_matcher = medical.RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("EMAIL")\

email_regex_matcher_pipeline = nlp.Pipeline(
    stages=[
        documentAssembler,
        email_regex_matcher
        ])

data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com."""]]).toDF("text")


email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)




val documentAssembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val email_regex_matcher = RegexMatcherInternalModel.pretrained("email_regex_matcher","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("EMAIL")

val email_regex_pipeline = new Pipeline().setStages(Array(
		documentAssembler,
		email_regex_matcher
  ))

val data = Seq("""D: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
 E-mail: sales@gmail.com.""").toDF("text")

val result = email_regex_pipeline.fit(data).transform(data)

Results


| chunk            |   begin |   end | label   |
|:-----------------|--------:|------:|:--------|
| info@domain.net  |      72 |    86 | EMAIL   |
| tech@support.org |      95 |   110 | EMAIL   |
| hale@gmail.com   |     121 |   134 | EMAIL   |
| sales@gmail.com  |     147 |   161 | EMAIL   |


Model Information

Model Name: email_regex_matcher
Compatibility: Healthcare NLP 6.2.2+
License: Licensed
Edition: Official
Input Labels: [document]
Output Labels: [entity_email]
Language: en
Size: 2.3 KB