Clinical Longformer

Description

This embeddings model was imported from Hugging Face(link). Clinical-Longformer is a clinical knowledge enriched version of Longformer that was further pretrained using MIMIC-III clinical notes. It allows up to 4,096 tokens as the model input.

Clinical-Longformer consistently out-performs ClinicalBERT across 10 baseline dataset for at least 2 percent. Those downstream experiments broadly cover named entity recognition (NER), question answering (QA), natural language inference (NLI) and text classification tasks.

Predicted Entities

Download Copy S3 URI

How to use

embeddings = LongformerEmbeddings.pretrained("clinical_longformer", "en")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")\
.setCaseSensitive(True)\
.setMaxSentenceLength(4096)
val embeddings = LongformerEmbeddings.pretrained("clinical_longformer", "en")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
.setCaseSensitive(True)
.setMaxSentenceLength(4096)
import nlu
nlu.load("en.embed.longformer.clinical").predict("""Put your text here.""")

Model Information

Model Name: clinical_longformer
Compatibility: Spark NLP 3.4.0+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [embeddings]
Language: en
Size: 534.9 MB
Case sensitive: true
Max sentence length: 4096

References

https://arxiv.org/pdf/2201.11838.pdf