Spark NLP for Healthcare Release Notes 2.5.3

 

2.5.3

Overview

We are pleased to announce the release of Spark NLP for Healthcare 2.5.3. This time we include four (4) new Annotators: FeatureAssembler, GenericClassifier, Yake Keyword Extractor and NerConverterInternal. We also include helper classes to read datasets from CodiEsp and Cantemist Spanish NER Challenges. This is also the first release to support the following models: ner_diag_proc (spanish), ner_neoplasms (spanish), ner_deid_enriched (english). We have also included Bugifxes and Enhancements for AnnotationToolJsonReader and ChunkMergeModel.

New Features

  • FeatureAssembler Transformer: Receives a list of column names containing numerical arrays and concatenates them to form one single feature_vector annotation
  • GenericClassifier Annotator: Receives a feature_vector annotation and outputs a category annotation
  • Yake Keyword Extraction Annotator: Receives a token annotation and outputs multi-token keyword annotations
  • NerConverterInternal Annotator: Similar to it’s open source counterpart in functionality, performs smarter extraction for complex tokenizations and confidence calculation
  • Readers for CodiEsp and Cantemist Challenges

Enhancements

  • AnnotationToolJsonReader includes parameter for preprocessing pipeline (from Document Assembling to Tokenization)
  • AnnotationToolJsonReader includes parameter to discard specific entity types

Bugfixes

  • ChunkMergeModel now prioritizes highest number of different entities when coverage is the same

Models

  • We have 2 new spanish models for Clinical Entity Recognition: ner_diag_proc and ner_neoplasms
  • We have a new english Named Entity Recognition model for deidentification: ner_deid_enriched

Versions

Last updated