Healthcare NLP v2.5.3 Release Notes

2.5.3

Overview

We are pleased to announce the release of Spark NLP for Healthcare 2.5.3. This time we include four (4) new Annotators: FeatureAssembler, GenericClassifier, Yake Keyword Extractor and NerConverterInternal. We also include helper classes to read datasets from CodiEsp and Cantemist Spanish NER Challenges. This is also the first release to support the following models: ner_diag_proc (spanish), ner_neoplasms (spanish), ner_deid_enriched (english). We have also included Bugifxes and Enhancements for AnnotationToolJsonReader and ChunkMergeModel.

New Features

FeatureAssembler Transformer: Receives a list of column names containing numerical arrays and concatenates them to form one single feature_vector annotation
GenericClassifier Annotator: Receives a feature_vector annotation and outputs a category annotation
Yake Keyword Extraction Annotator: Receives a token annotation and outputs multi-token keyword annotations
NerConverterInternal Annotator: Similar to it’s open source counterpart in functionality, performs smarter extraction for complex tokenizations and confidence calculation
Readers for CodiEsp and Cantemist Challenges

Enhancements

AnnotationToolJsonReader includes parameter for preprocessing pipeline (from Document Assembling to Tokenization)
AnnotationToolJsonReader includes parameter to discard specific entity types

Bugfixes

ChunkMergeModel now prioritizes highest number of different entities when coverage is the same

Models

We have 2 new spanish models for Clinical Entity Recognition: ner_diag_proc and ner_neoplasms
We have a new english Named Entity Recognition model for deidentification: ner_deid_enriched

Versions

Version
Version
Version

PREVIOUSVersion Compatibility