Description
SentenceDetectorDL (SDDL) is based on a general-purpose neural network model for sentence boundary detection. The task of sentence boundary detection is to identify sentences within a text. Many natural language processing tasks take a sentence as an input unit, such as part-of-speech tagging, dependency parsing, named entity recognition, or machine translation.
Predicted Entities
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel\
.pretrained("sentence_detector_dl_healthcare_v2_wip", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentences")
pipeline = Pipeline(
stages=[
document_assembler,
sentence_detector
])
text = """He was given boluses of MS04 with some effect, he has since been placed on a PCA -
he take 80mg of oxycontin at home, his PCA dose is ~ 2 the morphine dose of the oxycontin,
he has also received ativan for anxiety. Repleted with 20 meq kcl po, 30 mmol K-phos iv and 2 gms
mag so4 iv. Size: Prostate gland measures 10x1.1x4.9 cm (LS x AP x TS). Estimated volume is 51.9 ml
and is mildly enlarged in size. Normal delineation pattern of the prostate gland is preserved.
"""
data = spark.createDataFrame([[text]]).toDF("text")
result = pipeline.fit(data).transform(data)
from johnsnowlabs import nlp, medical
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetectorDLModel\
.pretrained("sentence_detector_dl_healthcare_v2_wip", "en", "clinical/models") \
.setInputCols(["document"]) \
.setOutputCol("sentences")
pipeline = nlp.Pipeline(
stages=[
document_assembler,
sentence_detector
])
text = """He was given boluses of MS04 with some effect, he has since been placed on a PCA -
he take 80mg of oxycontin at home, his PCA dose is ~ 2 the morphine dose of the oxycontin,
he has also received ativan for anxiety. Repleted with 20 meq kcl po, 30 mmol K-phos iv and 2 gms
mag so4 iv. Size: Prostate gland measures 10x1.1x4.9 cm (LS x AP x TS). Estimated volume is 51.9 ml
and is mildly enlarged in size. Normal delineation pattern of the prostate gland is preserved.
"""
data = spark.createDataFrame([[text]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel
.pretrained("sentence_detector_dl_healthcare_v2_wip", "en", "clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector))
val data = Seq("""He was given boluses of MS04 with some effect, he has since been placed on a PCA -
he take 80mg of oxycontin at home, his PCA dose is ~ 2 the morphine dose of the oxycontin,
he has also received ativan for anxiety. Repleted with 20 meq kcl po, 30 mmol K-phos iv and 2 gms
mag so4 iv. Size: Prostate gland measures 10x1.1x4.9 cm (LS x AP x TS). Estimated volume is 51.9 ml
and is mildly enlarged in size. Normal delineation pattern of the prostate gland is preserved.
""").toDF("text")
val result = pipeline.fit(data).transform(data)
Results
|sent_id|sentence |
|------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|0 |He was given boluses of MS04 with some effect, he has since been placed on a PCA - \nhe take 80mg of oxycontin at home, his PCA dose is ~ 2 the morphine dose of the oxycontin, \nhe has also received ativan for anxiety.|
|1 |Repleted with 20 meq kcl po, 30 mmol K-phos iv and 2 gms \nmag so4 iv. |
|2 |Size: Prostate gland measures 10x1.1x4.9 cm (LS x AP x TS). |
|3 |Estimated volume is 51.9 ml \nand is mildly enlarged in size. |
|4 |Normal delineation pattern of the prostate gland is preserved. |
Model Information
Model Name: | sentence_detector_dl_healthcare_v2_wip |
Compatibility: | Healthcare NLP 5.4.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [sentences] |
Language: | en |
Size: | 377.4 KB |
References
The healthcare sentence detector DL model is trained on in-house domain-specific data.