Spark OCR release notes

 

4.4.2

Release date: 30-05-2023

We are glad to announce that Visual NLP 😎 4.4.2 has been released. This is a small release with mostly bug fixes and minor improvements.

Fixes

  • ImageTextDetectorV2 initialization bug happening in some cluster environments is now fixed.
  • PdfToText and PdfToHocr now return document dimensions using the same data type(integer).
  • Remaining 2 vulnerabilities from release 4.4.1 in JAR package are now gone.
  • Fixed the problem causing the following exception in HocrToTextTable: java.lang.UnsupportedOperationException.

New Features

  • Bounding boxes spawning multiple lines are now supported in PositionFinder!

original: image masked: image

Here for “Lockheed Martin” PositionFinder will return two bounding boxes. Remember that you can still link the two bounding boxes to the original entity by using the ‘chunk index’.

Previous versions

Last updated