Spark NLP release notes 3.3.0

3.3.0

Release date: 14-06-2021

Overview

Table detection and recognition for scanned documents.

For table detection we added ImageTableDetector. It’s based on CascadeTabNet which used Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet). The model was pre-trained on the COCO dataset and fine-tuned on ICDAR 2019 competitions dataset for table detection. It demonstrates state of the art results for ICDAR 2013 and TableBank. And top results for ICDAR 2019.

More details please read in Table Detection & Extraction in Spark OCR

New Features

ImageTableDetector is a DL model for detect tables on the image.
ImageTableCellDetector is a transformer for detect regions of cells in the table image.
ImageCellsToTextTable is a transformer for extract text from the detected cells.

New notebooks

Image Table Detection example
Image Cell Recognition example
Image Table Recognition

Versions

Version
Version
Version

PREVIOUSRelease Notes