4.3.3
Release date: 14-03-2023
We’re glad to announce that Visual NLP 😎 4.3.3 has been released.
Highlights
- New parameter keepOriginalEncoding in PdfToHocr.
- New Yolo-based table and form detector.
- Memory consumption in VisualQuestionAnswering and ImageTableDetector models has been improved.
- Fixes in AlabReader
- Fixes in HocrToTextTable.
New parameter keepOriginalEncoding in PdfToHocr
Now you can choose to make PdfToHocr return an ASCII normalized version of the characters present in the PDF(keepOriginalEncoding=False) or to return the original Unicode character(keepOriginalEncoding=True). Source PDF,
Keeping the encoding,
Not keeping it,
New Yolo-based Table and Form detector
This new model allows to distinguish between forms and tables, so you can apply different downstream processing afterwards.
Check a full example of utilization in this notebook.
Memory consumption in VisualQuestionAnswering and ImageTableDetector models has been improved
Memory utilization has been improved to make it more GC friendly. The practical result is that big jobs are more stable, and less likely to get restarted because of exhausting resources.
Fixes in AlabReader
AlabReader has been improved to fix some bugs, and to improve the performance.
Fixes in HocrToTextTable
HocrToTextTable has been improved in order to better handle some corner cases in which the last rows of tables were being missed.
This release of Visual NLP is compatible with version 4.3.1 of Spark-NLP and version 4.3.1 of Spark NLP for Healthcare.
Previous versions
- 5.4.1
- 5.4.0
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.0
- 5.1.2
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.3
- 4.3.0
- 4.2.4
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.14.0
- 3.13.0
- 3.12.0
- 3.11.0
- 3.10.0
- 3.9.1
- 3.9.0
- 3.8.0
- 3.7.0
- 3.6.0
- 3.5.0
- 3.4.0
- 3.3.0
- 3.2.0
- 3.1.0
- 3.0.0
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.0