Spark NLP for Healthcare Release Notes 2.4.0

 

2.4.0

Overview

We are glad to announce Spark NLP for Healthcare 2.4.0. This is an important release because of several refactorizations achieved in the core library, plus the introduction of several state of the art algorithms, new features and enhancements. We have included several architecture and performance improvements, that aim towards making the library more robust in terms of storage handling for Big Data. In the NLP aspect, we have introduced a ContextualParser, DocumentLogRegClassifier and a ChunkEntityResolverSelector. These last two Annotators also target performance time and memory consumption by lowering the order of computation and data loaded to memory in each step when designed following a hierarchical pattern. We have put a big effort on this one, so please enjoy and share your comments. Your words are always welcome through all our different channels. Thank you very much for your important doubts, bug reports and feedback; they are always welcome and much appreciated.

New Features

  • BigChunkEntityResolver Annotator: New experimental approach to reduce memory consumption at expense of disk IO.
  • ContextualParser Annotator: New entity parser that works based on context parameters defined in a JSON file.
  • ChunkEntityResolverSelector Annotator: New AnnotatorModel that takes advantage of the RecursivePipelineModel + LazyAnnotator pattern to annotate with different LazyAnnotators at runtime.
  • DocumentLogregClassifier Annotator: New Annotator that provides a wrapped TFIDF Vectorizer + LogReg Classifier for TOKEN AnnotatorTypes (either at Document level or Chunk level)

Enhancements

  • normalizedColumn Param is no longer required in ChunkEntityResolver Annotator (defaults to the labelCol Param value).
  • ChunkEntityResolverMetadata now has more data to infer whether the match is meaningful or not.

Bugfixes

  • Fixed a bug on ContextSpellChecker Annotator where unrecognized tokens would cause an exception if not in vocabulary.
  • Fixed a bug on ChunkEntityResolver Annotator where undetermined results were coming out of negligible confidence scores for matches.
  • Fixed a bug on ChunkEntityResolver Annotator where search would fail if the neighbours Param was grater than the number of nodes in the tree. Now it returns up to the number of nodes in the tree.

Deprecations

  • OCR Moves to its own JSL Spark OCR project.

Infrastructure

  • Spark NLP License is now required to utilize the library. Please follow the instructions on the shared email.

Versions

Last updated