Description
This model is trained on a list of clinical and biomedical datasets curated in-house, using the word2vec algorithm. The dataset curation cut-off date is March 2023 and the model is expected to have a better generalization on recent content. The size of the model is around 2 GB and has 200 dimensions. Our benchmark tests indicate that our legacy clinical embeddings (embeddings_clinical) can be replaced with this one while training a new model (existing/previous models will still need to use the legacy embeddings that they’re trained with).
How to use
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")\
.setInputCols(["document","token"])\
.setOutputCol("word_embeddings")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large","en","clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("word_embeddings")
Model Information
Model Name: | embeddings_clinical_large |
Type: | embeddings |
Compatibility: | Healthcare NLP 4.3.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [word_embeddings] |
Language: | en |
Size: | 2.0 GB |
Case sensitive: | true |
Dimension: | 200 |