Embeddings Clinical (Large)

Description

This model is trained on a list of clinical and biomedical datasets curated in-house, using the word2vec algorithm. The dataset curation cut-off date is March 2023 and the model is expected to have a better generalization on recent content. The size of the model is around 2 GB and has 200 dimensions. Our benchmark tests indicate that our legacy clinical embeddings (embeddings_clinical) can be replaced with this one while training a new model (existing/previous models will still need to use the legacy embeddings that they’re trained with).

Download Copy S3 URI

How to use


embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("word_embeddings")


val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")
    .setInputCols(Array("document","token"))
    .setOutputCol("word_embeddings")

Model Information

Model Name: embeddings_clinical_large
Type: embeddings
Compatibility: Healthcare NLP 4.3.2+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [word_embeddings]
Language: en
Size: 2.0 GB
Case sensitive: true
Dimension: 200