Warning: This document is for an old version of kraken. The latest version is 0.10.0.

kraken API

Kraken provides routines which are usable by third party tools. In general you can expect function in the kraken package to remain stable. We will try to keep these backward compatible, but as kraken is still in an early development stage and the API is still quite rudimentary nothing can be garantueed.

kraken.binarization module

kraken.binarization

An adaptive binarization algorithm.

kraken.binarization.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters:im (PIL.Image) – Image to test
Returns:True if the image contains only two different color values. False otherwise.
kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Parameters:
  • im (PIL.Image) –
  • threshold (float) –
  • zoom (float) – Zoom for background page estimation
  • escale (float) – Scale for estimating a mask over the text region
  • border (float) – Ignore this much of the border
  • perc (int) – Percentage for filters
  • range (int) – Range for filters
  • low (int) – Percentile for black estimation
  • high (int) – Percentile for white estimation
Returns:

PIL.Image containing the binarized image

Raises:

KrakenInputException when trying to binarize an empty image.

kraken.html module

kraken.pageseg module

kraken.pageseg

Layout analysis and script detection methods.

kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True)

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
  • scale (float) – Scale of the image
  • maxcolseps (int) – Maximum number of whitespace column separators
  • black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
  • no_hlines (bool) – Switch for horizontal line removal
Returns:

‘$dir’, ‘boxes’: [(x1, y1, x2, y2),…]}: A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’.

Return type:

{‘text_direction’

Raises:
  • KrakenInputException if the input image is not binarized or the text
  • direction is invalid.
kraken.pageseg.detect_scripts(im, bounds, model='/home/mittagessen/git/kraken/kraken/script.mlmodel', valid_scripts=None)

Detects scripts in a segmented page.

Classifies lines returned by the page segmenter into runs of scripts/writing systems.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • model (str) – Location of the script classification model or None for default.
  • valid_scripts (list) – List of valid scripts.
Returns:

True, ‘text_direction’: ‘$dir’, ‘boxes’: [[(script, (x1, y1, x2, y2)),…]]}: A dictionary containing the text direction and a list of lists of reading order sorted bounding boxes under the key ‘boxes’ with each list containing the script segmentation of a single line. Script is a ISO15924 4 character identifier.

Return type:

{‘script_detection’

Raises:

KrakenInvalidModelException if no clstm module is available.

kraken.rpred module

kraken.rpred

Generators for recognition on lines images.

class kraken.rpred.ocr_record(prediction, cuts, confidences)

Bases: object

A record object containing the recognition result of a single line

kraken.rpred.bidi_record(record)

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters:record (kraken.rpred.ocr_record) –
Returns:kraken.rpred.ocr_record
kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, script_ignore=None)

Multi-model version of kraken.rpred.rpred.

Takes a dictionary of ISO15924 script identifiers->models and an script-annotated segmentation to dynamically select appropriate models for these lines.

Parameters:
  • nets (dict) – A dict mapping ISO15924 identifiers to TorchSegRecognizer objects. Recommended to be an defaultdict.
  • im (PIL.Image) – Image to extract text from bounds (dict): A dictionary containing a ‘boxes’ entry with a list of lists of coordinates (script, (x0, y0, x1, y1)) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
  • script_ignore (list) – List of scripts to ignore during recognition
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

Raises:
  • KrakenInputException if the mapping between segmentation scripts and
  • networks is incomplete.
kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)

Uses a RNN to recognize text

Parameters:
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object
  • im (PIL.Image) – Image to extract text from
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.