Description
This pipeline, extracts emails in clinical notes using rule-based RegexMatcherInternal annotator.
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("email_regex_matcher_pipeline", "en", "clinical/models")
sample_text = """ ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
from johnsnowlabs import nlp, medical
pipeline = nlp.PretrainedPipeline("email_regex_matcher_pipeline", "en", "clinical/models")
sample_text = """ ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val pipeline = PretrainedPipeline("email_regex_matcher_pipeline", "en", "clinical/models")
val sample_text = """ ID: 1231511863, The driver's license no:A334455B, the SSN:324598674 and info@domain.net, mail: tech@support.org, e-mail: hale@gmail.com .
E-mail: sales@gmail.com.
"""
val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
Results
| chunk | begin | end | label |
|:-----------------|--------:|------:|:--------|
| info@domain.net | 72 | 86 | EMAIL |
| tech@support.org | 95 | 110 | EMAIL |
| hale@gmail.com | 121 | 134 | EMAIL |
| sales@gmail.com | 147 | 161 | EMAIL |
Model Information
| Model Name: | email_regex_matcher_pipeline |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 6.3.0+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 6.9 KB |
Included Models
- DocumentAssembler
- RegexMatcherInternalModel
- ChunkConverter