1.1.0
Release date: 03-03-2020
Overview
This release contains improvements for preprocessing image before run OCR and added possibility to store results to PDF for keep original formatting.
New Features
- Added auto calculation maximum size of objects for removing in
ImageRemoveObjects
. This improvement avoids to remove.
and affect symbols with dots (i
,!
,?
). AddedminSizeFont
param toImageRemoveObjects
transformer for activate this functional. - Added
ocrParams
parameter toImageToText
transformer for set any ocr params. - Added extraction font size in
ImageToText
- Added
TextToPdf
transformer for render text with positions to pdf file.
Enhancements
- Added setting resolution in
ImageToText
. And addedignoreResolution
param with defaulttrue
value toImageToText
transformer for back compatibility. - Added parsing resolution from image metadata in
BinaryToImage
transformer. - Added storing resolution in
PdfToImage
transformer. - Added resolution field to Image schema.
- Updated ‘start’ function for set ‘PYSPARK_PYTHON’ env variable.
- Improve auto-scaling/skew correction:
- improved access to images values
- removing unnecessary copies of images
- adding more test cases
- improving auto-correlation in auto-scaling.
Versions
- 5.4.1
- 5.4.0
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.0
- 5.1.2
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.3
- 4.3.0
- 4.2.4
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.14.0
- 3.13.0
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.1
- 3.9.0
- 3.8.0
- 3.7.0
- 3.6.0
- 3.5.0
- 3.4.0
- 3.3.0
- 3.2.0
- 3.1.0
- 3.0.0
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.0
PREVIOUSRelease Notes