Description
This model extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models") \
.setInputCols(["sentence"])\
.setOutputCol("email_entity")\
regex_pipeline = Pipeline().setStages([
documentAssembler,
sentenceDetector,
tokenizer,
regex_matcher])
data = spark.createDataFrame([["""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and jadjada_adald19@msku.edu.tr, mail: afakfl_lakf19@yahoo.com, e-mail: hale@gmail.com .
EMAIL: afakfl_lakf19@yahoo.com, E-mail: hale@gmail.com ."""]]).toDF("text")
result = regex_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val regex_matcher = RegexMatcherInternalModel.pretrained("email_matcher","en","clinical/models")
.setInputCols(Array("sentence"))
.setOutputCol("email_entity")
.setMergeOverlapping(true)
val regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
sentenceDetector,
tokenizer,
regex_matcher))
val data = Seq(""""ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and jadjada_adald19@msku.edu.tr, mail: afakfl_lakf19@yahoo.com, e-mail: hale@gmail.com .
EMAIL: afakfl_lakf19@yahoo.com, E-mail: hale@gmail.com .""").toDF("text")
val result = regex_pipeline.fit(data).transform(data)
Results
+---------------------------+-----+---+-----+
| chunk|begin|end|label|
+---------------------------+-----+---+-----+
|jadjada_adald19@msku.edu.tr| 72| 98|EMAIL|
| afakfl_lakf19@yahoo.com| 107|129|EMAIL|
| hale@gmail.com| 140|153|EMAIL|
| afakfl_lakf19@yahoo.com| 164|186|EMAIL|
| hale@gmail.com| 197|210|EMAIL|
+---------------------------+-----+---+-----+
Model Information
Model Name: | email_matcher |
Compatibility: | Healthcare NLP 5.3.3+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence] |
Output Labels: | [email] |
Language: | en |
Size: | 6.6 KB |