Spark NLP for Healthcare Release Notes 2.5.0

 

2.5.0

Overview

We are happy to bring you Spark NLP for Healthcare 2.5.0 with new Annotators, Models and Data Readers. Model composition and iteration is now faster with readers and annotators designed for real world tasks. We introduce ChunkMerge annotator to combine all CHUNKS extracted by different Entity Extraction Annotators. We also introduce an Annotation Reader for JSL AI Platform’s Annotation Tool. This release is also the first one to support the models: ner_large_clinical, ner_events_clinical, assertion_dl_large, chunkresolve_loinc_clinical, deidentify_large And of course we have fixed some bugs.

New Features

  • AnnotationToolJsonReader is a new class that imports a JSON from AI Platform’s Annotation Tool an generates NER and Assertion training datasets
  • ChunkMerge Annotator is a new functionality that merges two columns of CHUNKs handling overlaps with a very straightforward logic: max coverage, max # entities
  • ChunkMerge Annotator handles inputs from NerDLModel, RegexMatcher, ContextualParser, TextMatcher
  • A DeIdentification pretrained model can now work in ‘mask’ or ‘obfuscate’ mode

Enhancements

  • DeIdentification Annotator has a more consistent API:
    • mode param with values (‘mask’l’obfuscate’) to drive its behavior
    • dateFormats param a list of string values to to select which dateFormats to obfuscate (and which to just mask)
  • DeIdentification Annotator no longer automatically obfuscates dates. Obfuscation is now driven by mode and dateFormats params
  • A DeIdentification pretrained model can now work in ‘mask’ or ‘obfuscate’ mode

Bugfixes

  • DeIdentification Annotator now correctly deduplicates protected entities coming from NER / Regex
  • DeIdentification Annotator now indexes chunks correctly after merging them
  • AssertionDLApproach Annotator can now be trained with the graph in any folder specified by setting graphFolder param
  • AssertionDLApproach now has the setClasses param setter in Python wrapper
  • JVM Memory and Kryo Max Buffer size increased to 32G and 2000M respectively in sparknlp_jsl.start(secret) function

Versions

Last updated