Clinical Longformer

Description

This embeddings model was imported from Hugging Face(link). Clinical-Longformer is a clinical knowledge enriched version of Longformer that was further pretrained using MIMIC-III clinical notes. It allows up to 4,096 tokens as the model input.

Clinical-Longformer consistently out-performs ClinicalBERT across 10 baseline dataset for at least 2 percent. Those downstream experiments broadly cover named entity recognition (NER), question answering (QA), natural language inference (NLI) and text classification tasks.

Predicted Entities

Download Copy S3 URI

How to use

embeddings = LongformerEmbeddings.pretrained("clinical_longformer", "en")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")\
.setCaseSensitive(True)\
.setMaxSentenceLength(4096)

Model Information

Model Name:	clinical_longformer
Compatibility:	Spark NLP 3.4.0+
License:	Open Source
Edition:	Official
Input Labels:	[sentence, token]
Output Labels:	[embeddings]
Language:	en
Size:	534.9 MB
Case sensitive:	true
Max sentence length:	4096

References

https://arxiv.org/pdf/2201.11838.pdf

PREVIOUSDetect Persons, Locations, Organizations and Misc Entities in English

NEXTNews Classifier Pipeline for Urdu texts