Description
This model is designed to remove stop words from clinical phenotype descriptions, particularly in the context of Human Phenotype Ontology (HPO).
Predicted Entities
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
stopwords_cleaner = StopWordsCleaner.pretrained("stopwords_removal_hpo", "en", "clinical/models") \
.setInputCols("token")\
.setOutputCol("cleanTokens")\
.setCaseSensitive(False)
pipeline = Pipeline().setStages([
document_assembler,
tokenizer,
stopwords_cleaner
])
text_df = spark.createDataFrame([["The patient shows no signs of muscle weakness or developmental delay"]]).toDF("text")
result_df = pipeline.fit(text_df).transform(text_df)
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
stopwords_cleaner = nlp.StopWordsCleaner.pretrained("stopwords_removal_hpo", "en", "clinical/models") \
.setInputCols("token")\
.setOutputCol("cleanTokens")\
.setCaseSensitive(False)
pipeline = nlp.Pipeline().setStages([
document_assembler,
tokenizer,
stopwords_cleaner
])
text_df = spark.createDataFrame([["The patient shows no signs of muscle weakness or developmental delay"]]).toDF("text")
result_df = pipeline.fit(text_df).transform(text_df)
import spark.implicits._
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val stopWordsCleaner = StopWordsCleaner.pretrained("stopwords_removal_hpo", "en", "clinical/models")
.setInputCols("token")
.setOutputCol("cleanTokens")
.setCaseSensitive(false)
val pipeline = new Pipeline().setStages(Array(
documentAssembler,
tokenizer,
stopWordsCleaner
))
val textData = Seq(
"The patient shows no signs of muscle weakness or developmental delay"
).toDF("text")
val model = pipeline.fit(textData)
val resultDF = model.transform(textData)
Results
| | token | cleanTokens |
|---|---------------|----------------|
| 0 | The | -- |
| 1 | patient | patient |
| 2 | shows | shows |
| 3 | no | no |
| 4 | signs | signs |
| 5 | of | -- |
| 6 | muscle | muscle |
| 7 | weakness | weakness |
| 8 | or | -- |
| 9 | developmental | developmental |
|10 | delay | delay |
Model Information
Model Name: | stopwords_removal_hpo |
Compatibility: | Healthcare NLP 5.5.3+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [token] |
Output Labels: | [cleanTokens] |
Language: | en |
Size: | 1.4 KB |
Case sensitive: | false |