Description
This model extracts URLs in clinical notes using rule-based RegexMatcherInternal annotator.
Predicted Entities
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
url_regex_matcher = RegexMatcherInternalModel.pretrained("url_matcher","en","clinical/models") \
.setInputCols(["document"])\
.setOutputCol("URL")\
url_regex_matcher_pipeline = Pipeline(
stages=[
documentAssembler,
url_regex_matcher
])
data = spark.createDataFrame([
["""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
"""]]).toDF("text")
result = url_regex_matcher_pipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val url_regex_matcher = RegexMatcherInternalModel.pretrained("url_matcher","en","clinical/models")
.setInputCols(Array("document"))
.setOutputCol("URL")
val regex_pipeline = new Pipeline().setStages(Array(
documentAssembler,
url_regex_matcher
))
val data = Seq("""Name: ID: 1231511863, Driver's License No: A334455B, SSN: 324-59-8674. E-mail: hale@gmail.com.
For more details, visit our website at www.johnsnowlabs.com or check out http://example.com for general info.
For secure access, go to https://secure.example.com. File transfers can be done via ftp://files.example.com.
""").toDF("text")
val result = regex_pipeline.fit(data).transform(data)
Results
+--------------------------+-----+---+-----+
|chunk |begin|end|label|
+--------------------------+-----+---+-----+
|www.johnsnowlabs.com |142 |161|URL |
|http://example.com |176 |193|URL |
|https://secure.example.com|247 |272|URL |
|ftp://files.example.com |306 |328|URL |
+--------------------------+-----+---+-----+
Model Information
Model Name: | url_matcher |
Compatibility: | Healthcare NLP 5.4.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [URL] |
Language: | en |
Size: | 2.2 KB |