Embeddings Clinical (Large)

Description

This model is trained on a list of clinical and biomedical datasets curated in-house, using the word2vec algorithm. The dataset curation cut-off date is March 2023 and the model is expected to have a better generalization on recent content. The size of the model is around 2 GB and has 200 dimensions. Our benchmark tests indicate that our legacy clinical embeddings (embeddings_clinical) can be replaced with this one while training a new model (existing/previous models will still need to use the legacy embeddings that they’re trained with).

Download Copy S3 URI

How to use

embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("word_embeddings")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")
    .setInputCols(Array("document","token"))
    .setOutputCol("word_embeddings")

Model Information

Model Name:	embeddings_clinical_large
Type:	embeddings
Compatibility:	Healthcare NLP 4.3.2+
License:	Licensed
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[word_embeddings]
Language:	en
Size:	2.0 GB
Case sensitive:	true
Dimension:	200

PREVIOUSLegal NER for NDA (Termination Clause)

NEXTEmbeddings Clinical (Medium)