2.4.5
Overview
We are glad to announce Spark NLP for Healthcare 2.4.5. As a new feature we are happy to introduce our new EnsembleEntityResolver which allows our Entity Resolution architecture to scale up in multiple orders of magnitude and handle datasets of millions of records on a sub-log computation increase We also enhanced our ChunkEntityResolverModel with 5 new distance calculations with weighting-array and aggregation-strategy params that results in more levers to finetune its performance against a given dataset.
New Features
- EnsembleEntityResolver consisting of an integrated TFIDF-Logreg classifier in the first layer + Multiple ChunkEntityResolvers in the second layer (one per each class)
- Five (5) new distances calculations for ChunkEntityResolver, namely:
- Token Based: TFIDF-Cosine, Jaccard, SorensenDice
- Character Based: JaroWinkler and Levenshtein
- Weight parameter that works as a multiplier for each distance result to be considered during their aggregation
- Three (3) aggregation strategies for the enabled distance in a particular instance, namely: AVERAGE, MAX and MIN
Enhancements
- ChunkEntityResolver can now compute distances over all the
neighbours
found and return the metadata just for the bestalternatives
that meet thethreshold
; before it would calculate them over the neighbours and return them all in the metadata - ChunkEntityResolver now has an
extramassPenalty
parameter to accoun for penalization of token-length difference in compared strings - Metadata for the ChunkEntityResolver has been updated accordingly to reflect all new features
- StringDistances class has been included in utils to aid in the calculation and organization of different types of distances for Strings
- HasFeaturesJsl trait has been included to support the serialization of Features including [T] <: AnnotatorModel[T] types
Bugfixes
- Frequency calculation for WMD in ChunkEntityResolver has been adjusted to account for real word count representation
- AnnotatorType for DocumentLogRegClassifier has been changed to CATEGORY to align with classifiers in Open Source library
Deprecations
- Legacy EntityResolver{Approach, Model} classes have been deprecated in favor of ChunkEntityResolver classes
- ChunkEntityResolverSelector classes has been deprecated in favor of EnsembleEntityResolver
Versions
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0
PREVIOUSVersion Compatibility