Description
This model extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
url_regex_matcher = RegexMatcherInternalModel.pretrained("url_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("URL")\
url_regex_matcher_pipeline = Pipeline(
stages=[
documentAssembler,
url_regex_matcher
])
data = spark.createDataFrame([
["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""]]).toDF("text")
result = url_regex_matcher_pipeline.fit(data).transform(data)
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
url_regex_matcher = medical.RegexMatcherInternalModel.pretrained("url_regex_matcher","en","clinical/models")\
.setInputCols(["document"])\
.setOutputCol("URL")\
url_regex_matcher_pipeline = nlp.Pipeline(
stages=[
documentAssembler,
url_regex_matcher
])
data = spark.createDataFrame([
["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""]]).toDF("text")
result = url_regex_matcher_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val url_regex_matcher = RegexMatcherInternalModel.pretrained("url_regex_matcher","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("URL")
val regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
url_regex_matcher
))
val data = Seq("""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
""").toDF("text")
val result = regex_pipeline.fit(data).transform(data)
Results
| chunk | begin | end | label |
|:---------------------------|--------:|------:|:--------|
| www.johnsnowlabs.com | 142 | 161 | URL |
| http://example.com | 176 | 193 | URL |
| https://secure.example.com | 246 | 271 | URL |
| ftp://files.example.com | 305 | 327 | URL |
Model Information
| Model Name: | url_regex_matcher |
| Compatibility: | Healthcare NLP 6.2.2+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document] |
| Output Labels: | [entity_url] |
| Language: | en |
| Size: | 2.2 KB |