Spark NLP for Healthcare Release Notes 2.5.2

 

2.5.2

Overview

We are really happy to bring you Spark NLP for Healthcare 2.5.2, with a couple new features and several enhancements in our existing annotators. This release was mainly dedicated to generate adoption in our AnnotationToolJsonReader, a connector that provide out-of-the-box support for out Annotation Tool and our practices. Also the ChunkMerge annotator has ben provided with extra functionality to remove entire entity types and to modify some chunk’s entity type We also dedicated some time in finalizing some refactorization in DeIdentification annotator, mainly improving type consistency and case insensitive entity dictionary for obfuscation. Thanks to the community for all the feedback and suggestions, it’s really comfortable to navigate together towards common functional goals that keep us agile in the SotA.

New Features

  • Brand new IOBTagger Annotator
  • NerDL Metrics provides an intuitive DataFrame API to calculate NER metrics at tag (token) and entity (chunk) level

Enhancements

  • AnnotationToolJsonReader includes parameters for document cleanup, sentence boundaries and tokenizer split chars
  • AnnotationToolJsonReader uses the task title if present and uses IOBTagger annotator
  • AnnotationToolJsonReader has improved alignment in assertion train set generation by using an alignTol parameter as tollerance in chunk char alignment
  • DeIdentification refactorization: Improved typing and replacement logic, case insensitive entities for obfuscation
  • ChunkMerge Annotator now handles:
  • Drop all chunks for an entity
  • Replace entity name
  • Change entity type for a specific (chunk, entity) pair
  • Drop specific (chunk, entity) pairs
  • caseSensitive param to EnsembleEntityResolver
  • Output logs for AssertionDLApproach loss
  • Disambiguator is back with improved dependency management

Bugfixes

  • Bugfix in python when Annotators shared domain parts across public and internal
  • Bugfix in python when ChunkMerge annotator was loaded from disk
  • ChunkMerge now weights the token coverage correctly when multiple multi-token entities overlap

Versions

Last updated