Description
This model extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
ip_regex_matcher = RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("IP")\
ip_regex_matcher_pipeline = Pipeline(
stages=[
documentAssembler,
ip_regex_matcher
])
data = spark.createDataFrame([
["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""]]).toDF("text")
result = ip_regex_matcher_pipeline.fit(data).transform(data)
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
ip_regex_matcher = medical.RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("IP")\
ip_regex_matcher_pipeline = nlp.Pipeline(
stages=[
documentAssembler,
ip_regex_matcher
])
data = spark.createDataFrame([
["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
"""]]).toDF("text")
result = ip_regex_matcher_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val ip_regex_matcher = RegexMatcherInternalModel.pretrained("ip_regex_matcher","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("IP")
val regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
ip_regex_matcher
))
val data = Seq("""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
Access the router at http://192.168.0.1 for configuration. Please connect to 10.0.0.1 to access the database..
Visit http://198.51.100.42 for more information. File transfers can be done via ftp://files.example.com.
The backup server is located at 172.16.254.1 and the monitoring system can be reached at 203.0.113.0.
""").toDF("text")
val result = regex_pipeline.fit(data).transform(data)
Results
| chunk | begin | end | label |
|:--------------|--------:|------:|:--------|
| 192.168.0.1 | 131 | 141 | IP |
| 10.0.0.1 | 180 | 187 | IP |
| 198.51.100.42 | 235 | 247 | IP |
| 172.16.254.1 | 367 | 378 | IP |
| 203.0.113.0 | 424 | 434 | IP |
Model Information
| Model Name: | ip_regex_matcher |
| Compatibility: | Healthcare NLP 6.2.2+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document] |
| Output Labels: | [entity_ip] |
| Language: | en |
| Size: | 2.2 KB |