1.8.0
Release date: 20-11-2020
Overview
Optimisation performance for processing multipage PDF documents. Support up to 10k pages per document.
New Features
- Added ImageAdaptiveBinarizer Scala transformer with support:
- Gaussian local thresholding
- Otsu thresholding
- Sauvola local thresholding
- Added possibility to split pdf to small documents for optimize processing in PdfToImage.
Enhancements
- Added applying binarization in PdfToImage for optimize memory usage.
- Added
pdfCoordinates
param to the ImageToText transformer. - Added ‘total_pages’ field to the PdfToImage transformer.
- Added different splitting strategies to the PdfToImage transformer.
- Simplified paging PdfToImage when run it with splitting to small PDF.
- Added params to the PdfToText for disable extra functionality.
- Added
master_url
param to the python start function.
Versions
- 5.4.1
- 5.4.0
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.0
- 5.1.2
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.3
- 4.3.0
- 4.2.4
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.14.0
- 3.13.0
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.1
- 3.9.0
- 3.8.0
- 3.7.0
- 3.6.0
- 3.5.0
- 3.4.0
- 3.3.0
- 3.2.0
- 3.1.0
- 3.0.0
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.0
PREVIOUSRelease Notes