URL Regex Matcher Pipeline

Description

This pipeline, extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.

How to use

from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

from johnsnowlabs import nlp, medical

pipeline = nlp.PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

val sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

Results

| chunk         |   begin |   end | label   |
|:--------------|--------:|------:|:--------|
| 192.168.0.1   |     131 |   141 | IP      |
| 10.0.0.1      |     180 |   187 | IP      |
| 198.51.100.42 |     235 |   247 | IP      |
| 172.16.254.1  |     367 |   378 | IP      |
| 203.0.113.0   |     424 |   434 | IP      |

Model Information

Model Name:	ip_regex_matcher_pipeline
Type:	pipeline
Compatibility:	Healthcare NLP 6.3.0+
License:	Licensed
Edition:	Official
Language:	en
Size:	6.9 KB

Included Models

DocumentAssembler
RegexMatcherInternalModel
ChunkConverter

PREVIOUSPipeline to Mapping ICD10CM Codes with Their Corresponding UMLS Codes

NEXTLicense Number Contextual Parser Pipeline