Healthcare NLP v2.6.2 Release Notes

2.6.2

Overview

We are very happy to announce that version 2.6.2 of Spark NLP Enterprise is ready to be installed and used. We are making available Named Entity Recognition, Sentence Classification and Entity Resolution models to analyze Adverse Drug Events in natural language text from clinical domains.

Models

NERs

We are pleased to announce that we have a brand new named entity recognition (NER) model for Adverse Drug Events (ADE) to extract ADE and DRUG entities from a given text.

ADE NER will have four versions in the library, trained with different size of word embeddings:

ner_ade_bioert (768d Bert embeddings)
ner_ade_clinicalbert (768d Bert embeddings)
ner_ade_clinical (200d clinical embeddings)
ner_ade_healthcare (100d healthcare embeddings)

More information and examples here

We are also releasing our first clinical pretrained classifier for ADE classification tasks. This new ADE classifier is trained on various ADE datasets, including the mentions in tweets to represent the daily life conversations as well. So it works well on the texts coming from academic context, social media and clinical notes. It’s trained with Clinical Biobert embeddings, which is the most powerful contextual language model in the clinical domain out there.

Classifiers

ADE classifier will have two versions in the library, trained with different Bert embeddings:

classifierdl_ade_bioert (768d BioBert embeddings)
classifierdl_adee_clinicalbert (768d ClinicalBert embeddings)

More information and examples here

Pipeline

By combining ADE NER and Classifier, we are releasing a new pretrained clinical pipeline for ADE tasks to save you from building pipelines from scratch. Pretrained pipelines are already fitted using certain annotators and transformers according to various use cases and you can use them as easy as follows:

pipeline = PretrainedPipeline('explain_clinical_doc_ade', 'en', 'clinical/models')

pipeline.annotate('my string')

explain_clinical_doc_ade is bundled with ner_ade_clinicalBert, and classifierdl_ade_clinicalBert. It can extract ADE and DRUG clinical entities, and then assign ADE status to a text (True means ADE, False means not related to ADE).

More information and examples here

Entity Resolver

We are releasing the first Entity Resolver for Athena (Automated Terminology Harmonization, Extraction and Normalization for Analytics, https://athena.ohdsi.org/) to extract concept ids via standardized medical vocabularies. For now, it only supports conditions section and can be used to map the clinical conditions with the corresponding standard terminology and then get the concept ids to store them in various database schemas. It is named as chunkresolve_athena_conditions_healthcare.

We added slim versions of several clinical NER models that are trained with 100d healthcare word embeddings, which is lighter and smaller in size.

ner_healthcare assertion_dl_healthcare ner_posology_healthcare ner_events_healthcare

Graph Builder

Spark NLP Licensed version has several DL based annotators (modules) such as NerDL, AssertionDL, RelationExtraction and GenericClassifier, and they are all based on Tensorflow (tf) with custom graphs. In order to make the creating and customizing the tf graphs for these models easier for our licensed users, we added a graph builder to the Python side of the library. Now you can customize your graphs and use them in the respected models while training a new DL model.

from sparknlp_jsl.training import tf_graph

tf_graph.build("relation_extraction",build_params={"input_dim": 6000, "output_dim": 3, 'batch_norm':1, "hidden_layers": [300, 200], "hidden_act": "relu", 'hidden_act_l2':1}, model_location=".", model_filename="re_with_BN")

More information and examples here

Versions

Version
Version
Version

PREVIOUSVersion Compatibility