Healthcare NLP v2.5.5 Release Notes

2.5.5

Overview

We are very happy to release Spark NLP for Healthcare 2.5.5 with a new state-of-the-art RelationExtraction annotator to identify relationships between entities coming from our pretrained NER models. This is also the first release to support Relation Extraction with the following two (2) models: re_clinical and re_posology in the clinical/models repository. We also include multiple bug fixes as usual.

New Features

RelationExtraction annotator that receives WORD_EMBEDDINGS, POS, CHUNK, DEPENDENCY and returns the CATEGORY of the relationship and a confidence score.

Enhancements

AssertionDL Annotator now keeps logs of the metrics while training
DeIdentification now has a default behavior of merging entities close in Levenshtein distance with setConsistentObfuscation and setSameEntityThreshold params.
DeIdentification now has a specific parameter setObfuscateDate to obfuscate dates (which will be otherwise just masked). The only formats obfuscated when the param is true will be the ones present in dateFormats param.
NerConverterInternal now has a greedyMode param that will merge all contiguous tags of the same type regardless of boundary tags like “B”,”E”,”S”.
AnnotationToolJsonReader includes mergeOverlapping parameter to merge (or not) overlapping entities from the Annotator jsons i.e. not included in the assertion list.

Bugfixes

DeIdentification documentation bug fix (typo)
DeIdentification training bug fix in obfuscation dictionary
IOBTagger now has the correct output type NAMED_ENTITY

Deprecations

EnsembleEntityResolver has been deprecated

Models

We have 2 new english Relationship Extraction model for Clinical and Posology NERs:
- re_clinical: with ner_clinical and embeddings_clinical
- re_posology: with ner_posology and embeddings_clinical

Versions

Version
Version
Version

PREVIOUSVersion Compatibility