Object detection

ImageHandwrittenDetector

ImageHandwrittenDetector is a DL model for detect handwritten text on the image. It’s based on Cascade Region-based CNN network.

Detector support following labels:

‘signature’
‘date’
‘name’
‘title’
‘address’
‘others’

Input Columns

Param name	Type	Default	Column Data Description
inputCol	string	image	image struct (Image schema)

Parameters

Param name	Type	Default	Description
scoreThreshold	float	0.5	Score threshold for output regions.
outputLabels	Array[String]		White list for output labels.
labels	Array[String]		List of labels

Output Columns

Param name	Type	Default	Column Data Description
outputCol	string	table_regions	array of [Coordinaties]ocr_structures#coordinate-schema)

Example:

from pyspark.ml import PipelineModel
from sparkocr.transformers import *

imagePath = "path to image"

# Read image file as binary file
df = spark.read 
    .format("binaryFile")
    .load(imagePath)

binary_to_image = BinaryToImage() \
    .setInputCol("content") \
    .setOutputCol("image")

# Define transformer for detect signature
signature_detector = ImageHandwrittenDetector \
  .pretrained("image_signature_detector_gsa0628", "en", "public/ocr/models") \
  .setInputCol("image") \
  .setOutputCol("signature_regions")

draw_regions = ImageDrawRegions() \
  .setInputCol("image") \
  .setInputRegionsCol("signature_regions") \
  .setOutputCol("image_with_regions")


pipeline = PipelineModel(stages=[
    binary_to_image,
    signature_detector,
    draw_regions
])

data = pipeline.transform(df)

display_images(data, "image_with_regions")

import com.johnsnowlabs.ocr.transformers.*
import com.johnsnowlabs.ocr.OcrContext.implicits._

val imagePath = "path to image"

// Read image file as binary file
val df = spark.read
  .format("binaryFile")
  .load(imagePath)
  .asImage("image")

// Define transformer for detect signature
val signature_detector = ImageHandwrittenDetector
  .pretrained("image_signature_detector_gsa0628", "en", "public/ocr/models")
  .setInputCol("image")
  .setOutputCol("signature_regions")

val draw_regions = new ImageDrawRegions()
  .setInputCol("image")
  .setInputRegionsCol("signature_regions")
  .setOutputCol("image_with_regions")


pipeline = PipelineModel(stages=[
    binary_to_image,
    signature_detector,
    draw_regions
])

val data = pipeline.transform(df)

data.storeImage("image_with_regions")

Output:

ImageTextDetector

ImageTextDetector is a DL model for detecting text on the image. It’s based on CRAFT network architecture.

Input Columns

Param name	Type	Default	Column Data Description
inputCol	string	image	image struct (Image schema)

Parameters

Param name	Type	Default	Description
scoreThreshold	float	0.9	Score threshold for output regions. Regions with an area below the threshold won’t be returned.
sizeThreshold	int	5	Threshold for the area of the detected regions.
textThreshold	float	0.4f	Threshold for the score of a region potentially containing text. The region score represents the probability that a given pixel is the center of the character. Higher values for this threshold will result in that only regions for which the confidence of containing text is high will be returned.
linkThreshold	float	0.4f	Threshold for the the link(affinity) score. The link score represents the space allowed between adjacent characters to be considered as a single word.
width	integer	0	Scale width to this value, if 0 use original width
height	integer	0	Scale height to this value, if 0 use original height

Output Columns

Param name	Type	Default	Column Data Description
outputCol	string	table_regions	array of [Coordinaties]ocr_structures#coordinate-schema)

Example:

from pyspark.ml import PipelineModel
from sparkocr.transformers import *

imagePath = "path to image"

# Read image file as binary file
df = spark.read 
    .format("binaryFile")
    .load(imagePath)

binary_to_image = BinaryToImage() \
    .setInputCol("content") \
    .setOutputCol("image")

# Define transformer for detect text
text_detector = ImageTextDetector \
  .pretrained("text_detection_v1", "en", "clinical/ocr") \
  .setInputCol("image") \
  .setOutputCol("text_regions") \
  .setSizeThreshold(10) \
  .setScoreThreshold(0.9) \
  .setLinkThreshold(0.4) \
  .setTextThreshold(0.2) \
  .setWidth(1512) \
  .setHeight(2016)

draw_regions = ImageDrawRegions() \
  .setInputCol("image") \
  .setInputRegionsCol("text_regions") \
  .setOutputCol("image_with_regions")


pipeline = PipelineModel(stages=[
    binary_to_image,
    text_detector,
    draw_regions
])

data = pipeline.transform(df)

display_images(data, "image_with_regions")

import com.johnsnowlabs.ocr.transformers.*
import com.johnsnowlabs.ocr.OcrContext.implicits._

val imagePath = "path to image"

// Read image file as binary file
val df = spark.read
  .format("binaryFile")
  .load(imagePath)
  .asImage("image")

// Define transformer for detect text
val text_detector = ImageTextDetector
  .pretrained("text_detection_v1", "en", "clinical/ocr")
  .setInputCol("image")
  .setOutputCol("text_regions")

val draw_regions = new ImageTextDetector()
  .setInputCol("image")
  .setInputRegionsCol("text_regions")
  .setOutputCol("image_with_regions")
  .setSizeThreshold(10)
  .setScoreThreshold(0.9)
  .setLinkThreshold(0.4)
  .setTextThreshold(0.2)
  .setWidth(1512)
  .setHeight(2016)


pipeline = PipelineModel(stages=[
    binary_to_image,
    text_detector,
    draw_regions
])

val data = pipeline.transform(df)

data.storeImage("image_with_regions")

Output:

ImageTextDetectorV2

ImageTextDetectorV2 is a DL model for detecting text on images. It is based on the CRAFT network architecture with refiner net. Refiner net runs as postprocessing, and is able to merge single words regions into lines. Currently, it’s available only on Python side.

Input Columns

Param name	Type	Default	Column Data Description
inputCol	string	image	image struct (Image schema)

Parameters

Param name	Type	Default	Description
scoreThreshold	float	0.7	Score threshold for output regions.
sizeThreshold	int	10	Threshold for height of the detected regions. Regions with a height below the threshold won’t be returned.
textThreshold	float	0.4f	Threshold for the score of a region potentially containing text. The region score represents the probability that a given pixel is the center of the character. Higher values for this threshold will result in that only regions for which the confidence of containing text is high will be returned.
linkThreshold	float	0.4f	Threshold for the the link(affinity) score. The link score represents the space allowed between adjacent characters to be considered as a single word.
width	integer	1280	Width of the desired input image. Image will be resized to this width.
withRefiner	boolean	false	Enable to run refiner net as postprocessing step.

Output Columns

Param name	Type	Default	Column Data Description
outputCol	string	table_regions	array of [Coordinaties]ocr_structures#coordinate-schema)

Example:

from pyspark.ml import PipelineModel
from sparkocr.transformers import *

imagePath = "path to image"

# Read image file as binary file
df = spark.read 
    .format("binaryFile")
    .load(imagePath)

binary_to_image = BinaryToImage() \
    .setInputCol("content") \
    .setOutputCol("image")

# Define transformer for detect text
text_detector = ImageTextDetectorV2 \
    .pretrained("image_text_detector_v2", "en", "clinical/ocr") \
    .setInputCol("image") \
    .setOutputCol("text_regions") \
    .setScoreThreshold(0.5) \
    .setTextThreshold(0.2) \
    .setSizeThreshold(10) \
    .setWithRefiner(True)

draw_regions = ImageDrawRegions() \
  .setInputCol("image") \
  .setInputRegionsCol("text_regions") \
  .setOutputCol("image_with_regions")


pipeline = PipelineModel(stages=[
    binary_to_image,
    text_detector,
    draw_regions
])

data = pipeline.transform(df)

display_images(data, "image_with_regions")

not implemented

PREVIOUSVisual document understanding

NEXTStructures and helpers