URL Regex Matcher Pipeline

Description

This pipeline, extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

pipeline = PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
        For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


from johnsnowlabs import nlp, medical

pipeline = nlp.PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")

sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
        For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""

result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val pipeline = PretrainedPipeline("url_regex_matcher_pipeline", "en", "clinical/models")

val sample_text = """ Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
        For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""

val result = pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

Results


| chunk                      |   begin |   end | label   |
|:---------------------------|--------:|------:|:--------|
| www.johnsnowlabs.com       |     142 |   161 | URL     |
| http://example.com         |     176 |   193 | URL     |
| https://secure.example.com |     246 |   271 | URL     |
| ftp://files.example.com    |     305 |   327 | URL     |

Model Information

Model Name: url_regex_matcher_pipeline
Type: pipeline
Compatibility: Healthcare NLP 6.3.0+
License: Licensed
Edition: Official
Language: en
Size: 6.9 KB

Included Models

  • DocumentAssembler
  • RegexMatcherInternalModel
  • ChunkConverter