URL Regex Matcher Pipeline

Description

This pipeline, extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


from johnsnowlabs import nlp, medical

pipeline = nlp.PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = PretrainedPipeline("ip_regex_matcher_pipeline", "en", "clinical/models")

val sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""

val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

Results


| chunk         |   begin |   end | label   |
|:--------------|--------:|------:|:--------|
| 192.168.0.1   |     131 |   141 | IP      |
| 10.0.0.1      |     180 |   187 | IP      |
| 198.51.100.42 |     235 |   247 | IP      |
| 172.16.254.1  |     367 |   378 | IP      |
| 203.0.113.0   |     424 |   434 | IP      |

Model Information

Model Name: ip_regex_matcher_pipeline
Type: pipeline
Compatibility: Healthcare NLP 6.3.0+
License: Licensed
Edition: Official
Language: en
Size: 6.9 KB

Included Models

  • DocumentAssembler
  • RegexMatcherInternalModel
  • ChunkConverter