URL Regex Matcher

Description

This model extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.

Predicted Entities

Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

url_regex_matcher = RegexMatcherInternalModel.pretrained("url_matcher","en","clinical/models") \
    .setInputCols(["document"])\
    .setOutputCol("URL")\

url_regex_matcher_pipeline = Pipeline(
    stages=[
        documentAssembler,
        url_regex_matcher
        ])

data = spark.createDataFrame([
    ["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
        For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""]]).toDF("text")


result = url_regex_matcher_pipeline.fit(data).transform(data)

val documentAssembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val url_regex_matcher = RegexMatcherInternalModel.pretrained("url_matcher","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("URL")

val regex_pipeline = new Pipeline().setStages(Array(
		documentAssembler,
		url_regex_matcher
  ))

val data = Seq("""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
        For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
        """).toDF("text")

val result = regex_pipeline.fit(data).transform(data)

Results

+--------------------------+-----+---+-----+
|chunk                     |begin|end|label|
+--------------------------+-----+---+-----+
|www.johnsnowlabs.com      |142  |161|URL  |
|http://example.com        |176  |193|URL  |
|https://secure.example.com|247  |272|URL  |
|ftp://files.example.com   |306  |328|URL  |
+--------------------------+-----+---+-----+

Model Information

Model Name: url_matcher
Compatibility: Healthcare NLP 5.4.0+
License: Licensed
Edition: Official
Input Labels: [document]
Output Labels: [URL]
Language: en
Size: 2.2 KB