Healthcare NLP v2.5.2 Release Notes

2.5.2

Overview

We are really happy to bring you Spark NLP for Healthcare 2.5.2, with a couple new features and several enhancements in our existing annotators. This release was mainly dedicated to generate adoption in our AnnotationToolJsonReader, a connector that provide out-of-the-box support for out Annotation Tool and our practices. Also the ChunkMerge annotator has ben provided with extra functionality to remove entire entity types and to modify some chunk’s entity type We also dedicated some time in finalizing some refactorization in DeIdentification annotator, mainly improving type consistency and case insensitive entity dictionary for obfuscation. Thanks to the community for all the feedback and suggestions, it’s really comfortable to navigate together towards common functional goals that keep us agile in the SotA.

New Features

Brand new IOBTagger Annotator
NerDL Metrics provides an intuitive DataFrame API to calculate NER metrics at tag (token) and entity (chunk) level

Enhancements

AnnotationToolJsonReader includes parameters for document cleanup, sentence boundaries and tokenizer split chars
AnnotationToolJsonReader uses the task title if present and uses IOBTagger annotator
AnnotationToolJsonReader has improved alignment in assertion train set generation by using an alignTol parameter as tollerance in chunk char alignment
DeIdentification refactorization: Improved typing and replacement logic, case insensitive entities for obfuscation
ChunkMerge Annotator now handles:
Drop all chunks for an entity
Replace entity name
Change entity type for a specific (chunk, entity) pair
Drop specific (chunk, entity) pairs
caseSensitive param to EnsembleEntityResolver
Output logs for AssertionDLApproach loss
Disambiguator is back with improved dependency management

Bugfixes

Bugfix in python when Annotators shared domain parts across public and internal
Bugfix in python when ChunkMerge annotator was loaded from disk
ChunkMerge now weights the token coverage correctly when multiple multi-token entities overlap

Versions

Version
Version
Version

PREVIOUSVersion Compatibility