Warning: This document is for an old version of kraken. The latest version is 0.10.0.

kraken API

Kraken provides routines which are usable by third party tools. In general you can expect function in the kraken package to remain stable. We will try to keep these backward compatible, but as kraken is still in an early development stage and the API is still quite rudimentary nothing can be garantueed.

kraken.binarization module

kraken.binarization

An adaptive binarization algorithm.

kraken.binarization.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters:im (PIL.Image) – Image to test
Returns:True if the image contains only two different color values. False otherwise.
kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Parameters:
  • im (PIL.Image) –
  • threshold (float) –
  • zoom (float) – Zoom for background page estimation
  • escale (float) – Scale for estimating a mask over the text region
  • border (float) – Ignore this much of the border
  • perc (int) – Percentage for filters
  • range (int) – Range for filters
  • low (int) – Percentile for black estimation
  • high (int) – Percentile for white estimation
Returns:

PIL.Image containing the binarized image

Raises:

KrakenInputException when trying to binarize an empty image.

kraken.html module

kraken.pageseg module

kraken.pageseg

Layout analysis and script detection methods.

kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True)

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
  • scale (float) – Scale of the image
  • maxcolseps (int) – Maximum number of whitespace column separators
  • black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
  • no_hlines (bool) – Switch for horizontal line removal
Returns:

‘$dir’, ‘boxes’: [(x1, y1, x2, y2),…]}: A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’.

Return type:

{‘text_direction’

Raises:
  • KrakenInputException if the input image is not binarized or the text
  • direction is invalid.
kraken.pageseg.detect_scripts(im, bounds, model='/home/mittagessen/git/kraken/kraken/script.mlmodel', valid_scripts=None)

Detects scripts in a segmented page.

Classifies lines returned by the page segmenter into runs of scripts/writing systems.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • model (str) – Location of the script classification model or None for default.
  • valid_scripts (list) – List of valid scripts.
Returns:

True, ‘text_direction’: ‘$dir’, ‘boxes’: [[(script, (x1, y1, x2, y2)),…]]}: A dictionary containing the text direction and a list of lists of reading order sorted bounding boxes under the key ‘boxes’ with each list containing the script segmentation of a single line. Script is a ISO15924 4 character identifier.

Return type:

{‘script_detection’

Raises:

KrakenInvalidModelException if no clstm module is available.

kraken.rpred module

kraken.rpred

Generators for recognition on lines images.

class kraken.rpred.ocr_record(prediction, cuts, confidences)

Bases: object

A record object containing the recognition result of a single line

kraken.rpred.bidi_record(record)

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters:record (kraken.rpred.ocr_record) –
Returns:kraken.rpred.ocr_record
kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, script_ignore=None)

Multi-model version of kraken.rpred.rpred.

Takes a dictionary of ISO15924 script identifiers->models and an script-annotated segmentation to dynamically select appropriate models for these lines.

Parameters:
  • nets (dict) – A dict mapping ISO15924 identifiers to TorchSegRecognizer objects. Recommended to be an defaultdict.
  • im (PIL.Image) – Image to extract text from bounds (dict): A dictionary containing a ‘boxes’ entry with a list of lists of coordinates (script, (x0, y0, x1, y1)) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
  • script_ignore (list) – List of scripts to ignore during recognition
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

Raises:
  • KrakenInputException if the mapping between segmentation scripts and
  • networks is incomplete.
kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)

Uses a RNN to recognize text

Parameters:
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object
  • im (PIL.Image) – Image to extract text from
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

kraken.transcrib module

kraken.linegen module

linegen

An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].

Line degradation uses a local model described in [1].

[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Mach ine Intelligence 22.11 (2000): 1209-1223.

class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)

Bases: object

Produces degraded line images using a single collection of font families.

render_line(text)

Draws a line onto a Cairo surface which will be converted to an pillow Image.

Parameters:

text (unicode) – A string which will be rendered as a single line.

Returns:

PIL.Image of mode ‘L’.

Raises:
  • KrakenCairoSurfaceException if the Cairo surface couldn’t be created
  • (usually caused by invalid dimensions.
kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=(0.5, 0.0, 0.5, 0.0))

Degrades and distorts a line using the same noise model used by ocropus.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • dsigma (float) –
  • eps (float) –
  • delta (float) –
  • degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.
Returns:

PIL.Image in mode ‘L’

kraken.linegen.degrade_line(im, eta=0.0, alpha=1.5, beta=1.5, alpha_0=1.0, beta_0=1.0)

Degrades a line image by adding noise.

For parameter meanings consult [1].

Parameters:
  • im (PIL.Image) – Input image
  • eta (float) –
  • alpha (float) –
  • beta (float) –
  • alpha_0 (float) –
  • beta_0 (float) –
Returns:

PIL.Image in mode ‘1’

kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)

Distorts a line image.

Run BEFORE degrade_line as a white border of 5 pixels will be added.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • sigma (float) –
  • eps (float) –
  • delta (float) –
Returns:

PIL.Image in mode ‘L’