Description
This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
email_regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models") \
.setInputCols(["document"])\
.setOutputCol("EMAIL")\
email_regex_matcher_pipeline = Pipeline(
stages=[
documentAssembler,
email_regex_matcher
])
data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: Mira.Gabriel.Terry@gmail.com."""]]).toDF("text")
email_regex_matcher_model = email_regex_matcher_pipeline.fit(data)
result = email_regex_matcher_model.transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val email_regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("EMAIL")
val email_regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
email_regex_matcher
))
val data = Seq("""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: Mira.Gabriel.Terry@gmail.com.""").toDF("text")
val result = email_regex_pipeline.fit(data).transform(data)
Results
+----------------------------+-----+---+-----+
|chunk |begin|end|label|
+----------------------------+-----+---+-----+
|info@domain.net |72 |86 |EMAIL|
|tech@support.org |95 |110|EMAIL|
|hale@gmail.com |121 |134|EMAIL|
|Mira.Gabriel.Terry@gmail.com|147 |174|EMAIL|
+----------------------------+-----+---+-----+
Model Information
Model Name: | email_matcher |
Compatibility: | Healthcare NLP 5.4.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [EMAIL] |
Language: | en |
Size: | 2.3 KB |