URL Regex Matcher

Description

This model extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.

Copy S3 URI

How to use


documentAssembler = DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

ip_regex_matcher = RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("IP")\

ip_regex_matcher_pipeline = Pipeline(
    stages=[
        documentAssembler,
        ip_regex_matcher
        ])

data = spark.createDataFrame([
    ["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""]]).toDF("text")


result = ip_regex_matcher_pipeline.fit(data).transform(data)



documentAssembler = nlp.DocumentAssembler()\
      .setInputCol("text")\
      .setOutputCol("document")

ip_regex_matcher = medical.RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("IP")\

ip_regex_matcher_pipeline = nlp.Pipeline(
    stages=[
        documentAssembler,
        ip_regex_matcher
        ])

data = spark.createDataFrame([
    ["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""]]).toDF("text")


result = ip_regex_matcher_pipeline.fit(data).transform(data)




val documentAssembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")

val ip_regex_matcher = RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("IP")

val regex_pipeline = new Pipeline().setStages(Array(
		documentAssembler,
		ip_regex_matcher
  ))

val data = Seq("""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
        Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
        Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
        The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
""").toDF("text")

val result = regex_pipeline.fit(data).transform(data)

Results


| chunk         |   begin |   end | label   |
|:--------------|--------:|------:|:--------|
| 192.168.0.1   |     131 |   141 | IP      |
| 10.0.0.1      |     180 |   187 | IP      |
| 198.51.100.42 |     235 |   247 | IP      |
| 172.16.254.1  |     367 |   378 | IP      |
| 203.0.113.0   |     424 |   434 | IP      |


Model Information

Model Name: ip_regex_matcher
Compatibility: Healthcare NLP 6.2.2+
License: Licensed
Edition: Official
Input Labels: [document]
Output Labels: [entity_ip]
Language: en
Size: 2.2 KB