kraken API

Kraken provides routines which are usable by third party tools. In general you can expect function in the kraken package to remain stable. We will try to keep these backward compatible, but as kraken is still in an early development stage and the API is still quite rudimentary nothing can be garantueed.

kraken.binarization module

kraken.binarization

An adaptive binarization algorithm.

kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Return type:

<module ‘PIL.Image’ from ‘/opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages/PIL/Image.py’>

Parameters:
  • im (PIL.Image)

  • threshold (float)

  • zoom (float) – Zoom for background page estimation

  • escale (float) – Scale for estimating a mask over the text region

  • border (float) – Ignore this much of the border

  • perc (int) – Percentage for filters

  • range (int) – Range for filters

  • low (int) – Percentile for black estimation

  • high (int) – Percentile for white estimation

Returns:

PIL.Image containing the binarized image

Raises:

KrakenInputException when trying to binarize an empty image.

kraken.serialization module

kraken.pageseg module

kraken.rpred module

kraken.transcribe module

Utility functions for ground truth transcription.

kraken.linegen module

linegen

An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].

Line degradation uses a local model described in [1].

[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22.11 (2000): 1209-1223.

class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)

Bases: object

Produces degraded line images using a single collection of font families.

render_line(text)

Draws a line onto a Cairo surface which will be converted to an pillow Image.

Parameters:

text (unicode) – A string which will be rendered as a single line.

Returns:

PIL.Image of mode ‘L’.

Raises:
  • KrakenCairoSurfaceException if the Cairo surface couldn't be created

  • (usually caused by invalid dimensions.

kraken.linegen.degrade_line(im, eta=0.0, alpha=1.5, beta=1.5, alpha_0=1.0, beta_0=1.0)

Degrades a line image by adding noise.

For parameter meanings consult [1].

Parameters:
  • im (PIL.Image) – Input image

  • eta (float)

  • alpha (float)

  • beta (float)

  • alpha_0 (float)

  • beta_0 (float)

Returns:

PIL.Image in mode ‘1’

kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)

Distorts a line image.

Run BEFORE degrade_line as a white border of 5 pixels will be added.

Parameters:
  • im (PIL.Image) – Input image

  • distort (float)

  • sigma (float)

  • eps (float)

  • delta (float)

Returns:

PIL.Image in mode ‘L’

kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=(0.5, 0.0, 0.5, 0.0))

Degrades and distorts a line using the same noise model used by ocropus.

Parameters:
  • im (PIL.Image) – Input image

  • distort (float)

  • dsigma (float)

  • eps (float)

  • delta (float)

  • degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.

Returns:

PIL.Image in mode ‘L’

kraken.lib.models module

kraken.lib.vgsl module

kraken.lib.codec

pytorch compatible codec with many-to-many mapping between labels and graphemes.

class kraken.lib.codec.PytorchCodec(charset)

Bases: object

Translates between labels and graphemes.

Parameters:

charset (Dict[str, Sequence[int]] | Sequence[str] | str)

decode(labels)

Decodes a labelling.

Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.

Return type:

List[Tuple[str, int, int, float]]

Parameters:

labels (list) – Input containing tuples (label, start, end, confidence).

Returns:

A list of tuples (code point, start, end, confidence)

Return type:

list

encode(s)

Encodes a string into a sequence of labels.

Return type:

IntTensor

Parameters:

s (str) – Input unicode string

Returns:

(torch.IntTensor) encoded label sequence

Raises:

KrakenEncodeException if encoding fails.

max_label()

Returns the maximum label value.

Return type:

int

merge(codec)

Transforms this codec (c1) into another (c2) reusing as many labels as possible.

The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.

Return type:

Tuple[PytorchCodec, Set]

Parameters:

codec (kraken.lib.codec.PytorchCodec)

Returns:

A merged codec and a list of labels that were removed from the original codec.

kraken.lib.train module

kraken.lib.dataset module

kraken.lib.ctc_decoder

Decoders for softmax outputs of CTC trained networks.

kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)

Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).

Return type:

List[Tuple[int, int, int, float]]

Parameters:
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • outputs (ndarray)

  • beam_size (int)

Returns:

A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.

kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)

Translates back the network output to a label sequence as the original ocropy/clstm.

Thresholds on class 0, then assigns the maximum (non-zero) class to each region.

Return type:

List[Tuple[int, int, int, float]]

Parameters:
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • threshold (float) – Threshold for 0 class when determining possible label locations.

  • outputs (ndarray)

Returns:

A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.

kraken.lib.ctc_decoder.greedy_decoder(outputs)

Translates back the network output to a label sequence using greedy/best path decoding as described in [0].

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

Return type:

List[Tuple[int, int, int, float]]

Parameters:
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • outputs (ndarray)

Returns:

A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.