Description
This pipeline, extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.
How to use
from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")
sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
from johnsnowlabs import nlp, medical
pipeline = nlp.PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")
sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""
result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline
val pipeline = PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")
val sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""
val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
Results
| chunk | begin | end | label |
|:---------------------------|--------:|------:|:--------|
| www.johnsnowlabs.com | 142 | 161 | URL |
| http://example.com | 176 | 193 | URL |
| https://secure.example.com | 246 | 271 | URL |
| ftp://files.example.com | 305 | 327 | URL |
Model Information
| Model Name: | url_regex_matcher_pipeline |
| Type: | pipeline |
| Compatibility: | Healthcare NLP 6.3.0+ |
| License: | Licensed |
| Edition: | Official |
| Language: | en |
| Size: | 6.9 KB |
Included Models
- DocumentAssembler
- RegexMatcherInternalModel
- ChunkConverter