Spark NLP for Healthcare Release Notes 2.5.5

 

2.5.5

Overview

We are very happy to release Spark NLP for Healthcare 2.5.5 with a new state-of-the-art RelationExtraction annotator to identify relationships between entities coming from our pretrained NER models. This is also the first release to support Relation Extraction with the following two (2) models: re_clinical and re_posology in the clinical/models repository. We also include multiple bug fixes as usual.

New Features

  • RelationExtraction annotator that receives WORD_EMBEDDINGS, POS, CHUNK, DEPENDENCY and returns the CATEGORY of the relationship and a confidence score.

Enhancements

  • AssertionDL Annotator now keeps logs of the metrics while training
  • DeIdentification now has a default behavior of merging entities close in Levenshtein distance with setConsistentObfuscation and setSameEntityThreshold params.
  • DeIdentification now has a specific parameter setObfuscateDate to obfuscate dates (which will be otherwise just masked). The only formats obfuscated when the param is true will be the ones present in dateFormats param.
  • NerConverterInternal now has a greedyMode param that will merge all contiguous tags of the same type regardless of boundary tags like “B”,”E”,”S”.
  • AnnotationToolJsonReader includes mergeOverlapping parameter to merge (or not) overlapping entities from the Annotator jsons i.e. not included in the assertion list.

Bugfixes

  • DeIdentification documentation bug fix (typo)
  • DeIdentification training bug fix in obfuscation dictionary
  • IOBTagger now has the correct output type NAMED_ENTITY

Deprecations

  • EnsembleEntityResolver has been deprecated

Models

  • We have 2 new english Relationship Extraction model for Clinical and Posology NERs:
    • re_clinical: with ner_clinical and embeddings_clinical
    • re_posology: with ner_posology and embeddings_clinical

Versions

Last updated