kraken API

Kraken provides routines which are usable by third party tools. In general you can expect function in the kraken package to remain stable. We will try to keep these backward compatible, but as kraken is still in an early development stage and the API is still quite rudimentary nothing can be garantueed.

kraken.binarization module

kraken.binarization.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters:im (PIL.Image) – Image to test
Returns:True if the image contains only two different color values. False otherwise.
kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Parameters:
  • im (PIL.Image) –
  • threshold (float) –
  • zoom (float) – Zoom for background page estimation
  • escale (float) – Scale for estimating a mask over the text region
  • border (float) – Ignore this much of the border
  • perc (int) – Percentage for filters
  • range (int) – Range for filters
  • low (int) – Percentile for black estimation
  • high (int) – Percentile for white estimation
Returns:

PIL.Image containing the binarized image

kraken.serialization module

kraken.serialization.serialize(records, image_name=u'', image_size=(0, 0), writing_mode=u'horizontal-tb', scripts=None, template=u'hocr')

Serializes a list of ocr_records into an output document.

Serializes a list of predictions and their corresponding positions by doing some hOCR-specific preprocessing and then renders them through one of several jinja2 templates.

Parameters:
  • records (iterable) – List of kraken.rpred.ocr_record
  • image_name (str) – Name of the source image
  • image_size (tuple) – Dimensions of the source image
  • writing_mode (str) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.
  • scripts (list) – List of scripts contained in the OCR records
  • template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.

kraken.pageseg module

kraken.pageseg.segment(im, text_direction=u'horizontal-lr', scale=None, maxcolseps=2, black_colseps=False)

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
  • scale (float) – Scale of the image
  • maxcolseps (int) – Maximum number of whitespace column separators
  • black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
Returns:

‘$dir’, ‘boxes’: [(x1, y1, x2, y2),…]}: A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’.

Return type:

{‘text_direction’

Raises:
  • KrakenInputException if the input image is not binarized or the text
  • direction is invalid.
kraken.pageseg.detect_scripts(im, bounds, model=None)

Detects scripts in a segmented page.

Classifies lines returned by the page segmenter into runs of scripts/writing systems.

Parameters:
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • model (str) – Location of the script classification model or None for default.
Returns:

True, ‘text_direction’: ‘$dir’, ‘boxes’: [[(script, (x1, y1, x2, y2)),…]]}: A dictionary containing the text direction and a list of lists of reading order sorted bounding boxes under the key ‘boxes’ with each list containing the script segmentation of a single line. Script is a ISO15924 4 character identifier.

Return type:

{‘script_detection’

Raises:
  • KrakenInputException if the input image is not binarized or the text
  • direction is invalid.
  • KrakenInvalidModelException if no clstm module is available.

kraken.rpred module

kraken.rpred.bidi_record(record)

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters:record (kraken.rpred.ocr_record) –
Returns:kraken.rpred.ocr_record
kraken.rpred.dewarp(normalizer, im)

Dewarps an image of a line using a kraken.lib.lineest.CenterNormalizer instance.

Parameters:
  • normalizer (kraken.lib.lineest.CenterNormalizer) – A line normalizer instance
  • im (PIL.Image) – Image to dewarp
Returns:

PIL.Image containing the dewarped image.

kraken.rpred.extract_boxes(im, bounds)

Yields the subimages of image im defined in the list of bounding boxes in bounds preserving order.

Parameters:
  • im (PIL.Image) – Input image
  • bounds (list) – A list of tuples (x1, y1, x2, y2)
Yields:

(PIL.Image) the extracted subimage

kraken.rpred.mm_rpred(nets, im, bounds, pad=16, line_normalization=True, bidi_reordering=True)

Multi-model version of kraken.rpred.rpred.

Takes a dictionary of ISO15924 script identifiers->models and an script-annotated segmentation to dynamically select appropriate models for these lines.

Parameters:
  • nets (dict) – A dict mapping ISO15924 identifiers to SegRecognizer objects. Recommended to be an defaultdict.
  • im (PIL.Image) – Image to extract text from bounds (dict): A dictionary containing a ‘boxes’ entry with a list of lists of coordinates (script, (x0, y0, x1, y1)) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line
  • line_normalization (bool) – Dewarp line using the line estimator contained in the network. If no normalizer is available one using the default parameters is created. By aware that you may have to scale lines manually to the target line height if disabled.
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

class kraken.rpred.ocr_record(prediction, cuts, confidences)

Bases: future.types.newobject.newobject

A record object containing the recognition result of a single line

kraken.rpred.rpred(network, im, bounds, pad=16, line_normalization=True, bidi_reordering=True)

Uses a RNN to recognize text

Parameters:
  • network (kraken.lib.lstm.SegRecognizer) – A SegRecognizer object
  • im (PIL.Image) – Image to extract text from
  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
  • pad (int) – Extra blank padding to the left and right of text line
  • line_normalization (bool) – Dewarp line using the line estimator contained in the network. If no normalizer is available one using the default parameters is created. By aware that you may have to scale lines manually to the target line height if disabled.
  • bidi_reordering (bool) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display.
Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

kraken.transcrib module

Utility functions for ground truth transcription.

kraken.train module

Utility functions for training CLSTM neural networks.

class kraken.train.GroundTruthContainer(images=None, split=<function <lambda>>, suffix=u'.gt.txt', normalization=None, reorder=True, partition=0.9, pad=16)

Bases: object

Container for ground truth used during training.

training_set

list – List of tuples (image, text) for training

test_set

list – List of tuples (image, text) for testing

alphabet

str – Sorted string of all codepoint found in the ground truth

add(image, split=<function <lambda>>, suffix=u'.gt.txt', normalization=None, reorder=True, pad=16)

Adds a single image to the training set.

repartition(partition=0.9)

Repartitions the training/test sets.

Parameters:partition (float) – Ground truth data partition ratio between training/test sets.
sample()

Samples a line image-text pair from the training set.

Returns:A tuple (line, text) with line being a numpy.array run through kraken.lib.lstm.prepare_line.

kraken.linegen module

linegen

An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].

Line degradation uses a local model described in [1].

[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22.11 (2000): 1209-1223.

class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)

Bases: future.types.newobject.newobject

Produces degraded line images using a single collection of font families.

render_line(text)

Draws a line onto a Cairo surface which will be converted to an pillow Image.

Parameters:

text (unicode) – A string which will be rendered as a single line.

Returns:

PIL.Image of mode ‘L’.

Raises:
  • KrakenCairoSurfaceException if the Cairo surface couldn’t be created
  • (usually caused by invalid dimensions.
kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=[(0.5, 0.0, 0.5, 0.0)])

Degrades and distorts a line using the same noise model used by ocropus.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • dsigma (float) –
  • eps (float) –
  • delta (float) –
  • degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.
Returns:

PIL.Image in mode ‘L’

kraken.linegen.degrade_line(im, eta=0, alpha=1.5, beta=1.5, alpha_0=1, beta_0=1)

Degrades a line image by adding noise

Parameters:im (PIL.Image) – Input image
Returns:PIL.Image in mode ‘1’
kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)

Distorts a line image.

Parameters:
  • im (PIL.Image) – Input image
  • distort (float) –
  • sigma (float) –
  • eps (float) –
  • delta (float) –
Returns:

PIL.Image in mode ‘L’