kraken API

Kraken provides routines which are usable by third party tools. In general you can expect function in the kraken package to remain stable. We will try to keep these backward compatible, but as kraken is still in an early development stage and the API is still quite rudimentary nothing can be garantueed.

kraken.binarization module

kraken.binarization.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters:im (PIL.Image) – Image to test
Returns:True if the image contains only two different color values. False otherwise.
kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Parameters:
  • im (PIL.Image) –
  • threshold (float) –
  • zoom (float) – Zoom for background page estimation
  • escale (float) – Scale for estimating a mask over the text region
  • border (float) – Ignore this much of the border
  • perc (int) – Percentage for filters
  • range (int) – Range for filters
  • low (int) – Percentile for black estimation
  • high (int) – Percentile for white estimation
Returns:

PIL.Image containing the binarized image

kraken.serialization module

kraken.serialization.serialize(records, image_name=u”, image_size=(0, 0), writing_mode=u’horizontal-tb’, template=u’hocr’)

Serializes a list of ocr_records into an output document.

Serializes a list of predictions and their corresponding positions by doing some hOCR-specific preprocessing and then renders them through one of several jinja2 templates.

Parameters:
  • records (iterable) – List of kraken.rpred.ocr_record
  • image_name (str) – Name of the source image
  • image_size (tuple) – Dimensions of the source image
  • writing_mode (str) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.
  • template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.

kraken.pageseg module

kraken.pageseg.segment(im, text_direction=u’horizontal-tb’, scale=None, maxcolseps=2, black_colseps=False)

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • text_direction (str) – Principal direction of the text (horizontal-tb/vertical-lr/rl)
  • scale (float) – Scale of the image
  • maxcolseps (int) – Maximum number of whitespace column separators
  • black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
Returns:

‘$dir’, ‘boxes’: [(x1, y1, x2, y2),…]}: A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’.

Return type:

{‘text_direction’

Raises:
  • KrakenInputException if the input image is not binarized or the text
  • direction is invalid.

kraken.rpred module

kraken.rpred.bidi_record(record)

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters:record (kraken.rpred.ocr_record) –
Returns:kraken.rpred.ocr_record
kraken.rpred.dewarp(normalizer, im)

Dewarps an image of a line using a kraken.lib.lineest.CenterNormalizer instance.

Parameters:
  • normalizer (kraken.lib.lineest.CenterNormalizer) – A line normalizer instance
  • im (PIL.Image) – Image to dewarp
Returns:

PIL.Image containing the dewarped image.

kraken.rpred.extract_boxes(im, bounds)

Yields the subimages of image im defined in the list of bounding boxes in bounds preserving order.

Parameters:
  • im (PIL.Image) – Input image
  • bounds (list) – A list of tuples (x1, y1, x2, y2)
Yields:

(PIL.Image) the extracted subimage

class kraken.rpred.ocr_record(prediction, cuts, confidences)

Bases: future.types.newobject.newobject

A record object containing the recognition result of a single line

kraken.rpred.rpred(network, im, bounds, pad=16, line_normalization=True, bidi_reordering=True)

Uses a RNN to recognize text

Parameters:
  • network (kraken.lib.lstm.SegRecognizer) – A SegRecognizer object
  • im (PIL.Image) – Image to extract text from
  • bounds (iterable) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-tb/vertical-lr/rl’.
  • bounds – An iterable returning a tuple defining the absolute coordinates (x0, y0, x1, y1) of a text line in the Image.
  • pad (int) – Extra blank padding to the left and right of text line
  • line_normalization (bool) – Dewarp line using the line estimator contained in the network. If no normalizer is available one using the default parameters is created. By aware that you may have to scale lines manually to the target line height if disabled.
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

kraken.transcrib module

Utility functions for ground truth transcription.

kraken.train module

Utility functions for training CLSTM neural networks.

class kraken.train.GroundTruthContainer(images=None, split=<function <lambda>>, suffix=u’.gt.txt’, normalization=None, reorder=True, partition=0.9, pad=16)

Bases: object

Container for ground truth used during training.

training_set

list – List of tuples (image, text) for training

test_set

list – List of tuples (image, text) for testing

alphabet

str – Sorted string of all codepoint found in the ground truth

add(image, split=<function <lambda>>, suffix=u’.gt.txt’, normalization=None, reorder=True, pad=16)

Adds a single image to the training set.

repartition(partition=0.9)

Repartitions the training/test sets.

Parameters:partition (float) – Ground truth data partition ratio between training/test sets.
sample()

Samples a line image-text pair from the training set.

Returns:A tuple (line, text) with line being a numpy.array run through kraken.lib.lstm.prepare_line.

kraken.linegen module

linegen

An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].

[0] https://github.com/googlei18n/nototools

class kraken.linegen.LineGenerator(family=’Sans’, font_size=32, language=None)

Bases: future.types.newobject.newobject

Produces degraded line images using a single collection of font families.

render_line(text)

Draws a line onto a Cairo surface which will be converted to an pillow Image.

Parameters:

text (unicode) – A string which will be rendered as a single line.

Returns:

PIL.Image of mode ‘L’.

Raises:
  • KrakenCairoSurfaceException if the Cairo surface couldn’t be created
  • (usually caused by invalid dimensions.
kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=[(0.5, 0.0, 0.5, 0.0)])

Degrades and distorts a line using the same noise model used by ocropus.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • dsigma (float) –
  • eps (float) –
  • delta (float) –
  • degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.
Returns:

PIL.Image in mode ‘L’

kraken.linegen.degrade_line(im, mean=0.0, sigma=0.001, density=0.002)

Degrades a line image by adding several kinds of noise.

Parameters:
  • im (PIL.Image) – Input image
  • mean (float) – Mean of distribution for Gaussian noise
  • sigma (float) – Standard deviation for Gaussian noise
  • density (float) – Noise density for Salt and Pepper noiase
Returns:

PIL.Image in mode ‘L’

kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)

Distorts a line image.

Run BEFORE degrade_line as a white border of 5 pixels will be added.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • sigma (float) –
  • eps (float) –
  • delta (float) –
Returns:

PIL.Image in mode ‘L’