kraken API¶
Kraken provides routines which are usable by third party tools. In general
you can expect function in the kraken package to remain stable. We will try
to keep these backward compatible, but as kraken is still in an early
development stage and the API is still quite rudimentary nothing can be
garantueed.
kraken.binarization module¶
kraken.binarization¶
An adaptive binarization algorithm.
- kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)¶
Performs binarization using non-linear processing.
- Return type:
<module ‘PIL.Image’ from ‘/opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages/PIL/Image.py’>
- Parameters:
im (PIL.Image)
threshold (float)
zoom (float) – Zoom for background page estimation
escale (float) – Scale for estimating a mask over the text region
border (float) – Ignore this much of the border
perc (int) – Percentage for filters
range (int) – Range for filters
low (int) – Percentile for black estimation
high (int) – Percentile for white estimation
- Returns:
PIL.Image containing the binarized image
- Raises:
KrakenInputException when trying to binarize an empty image. –
kraken.serialization module¶
kraken.pageseg module¶
kraken.rpred module¶
kraken.transcribe module¶
Utility functions for ground truth transcription.
kraken.linegen module¶
linegen¶
An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].
Line degradation uses a local model described in [1].
[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22.11 (2000): 1209-1223.
- class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)¶
Bases:
objectProduces degraded line images using a single collection of font families.
- render_line(text)¶
Draws a line onto a Cairo surface which will be converted to an pillow Image.
- Parameters:
text (unicode) – A string which will be rendered as a single line.
- Returns:
PIL.Image of mode ‘L’.
- Raises:
KrakenCairoSurfaceException if the Cairo surface couldn't be created –
(usually caused by invalid dimensions. –
- kraken.linegen.degrade_line(im, eta=0.0, alpha=1.5, beta=1.5, alpha_0=1.0, beta_0=1.0)¶
Degrades a line image by adding noise.
For parameter meanings consult [1].
- Parameters:
im (PIL.Image) – Input image
eta (float)
alpha (float)
beta (float)
alpha_0 (float)
beta_0 (float)
- Returns:
PIL.Image in mode ‘1’
- kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)¶
Distorts a line image.
Run BEFORE degrade_line as a white border of 5 pixels will be added.
- Parameters:
im (PIL.Image) – Input image
distort (float)
sigma (float)
eps (float)
delta (float)
- Returns:
PIL.Image in mode ‘L’
- kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=(0.5, 0.0, 0.5, 0.0))¶
Degrades and distorts a line using the same noise model used by ocropus.
- Parameters:
im (PIL.Image) – Input image
distort (float)
dsigma (float)
eps (float)
delta (float)
degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.
- Returns:
PIL.Image in mode ‘L’
kraken.lib.models module¶
kraken.lib.vgsl module¶
kraken.lib.codec¶
pytorch compatible codec with many-to-many mapping between labels and graphemes.
- class kraken.lib.codec.PytorchCodec(charset)¶
Bases:
objectTranslates between labels and graphemes.
- Parameters:
charset (Dict[str, Sequence[int]] | Sequence[str] | str)
- decode(labels)¶
Decodes a labelling.
Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.
- Return type:
List[Tuple[str,int,int,float]]- Parameters:
labels (list) – Input containing tuples (label, start, end, confidence).
- Returns:
A list of tuples (code point, start, end, confidence)
- Return type:
list
- encode(s)¶
Encodes a string into a sequence of labels.
- Return type:
IntTensor- Parameters:
s (str) – Input unicode string
- Returns:
(torch.IntTensor) encoded label sequence
- Raises:
KrakenEncodeException if encoding fails. –
- max_label()¶
Returns the maximum label value.
- Return type:
int
- merge(codec)¶
Transforms this codec (c1) into another (c2) reusing as many labels as possible.
The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.
- Return type:
Tuple[PytorchCodec,Set]- Parameters:
codec (kraken.lib.codec.PytorchCodec)
- Returns:
A merged codec and a list of labels that were removed from the original codec.
kraken.lib.train module¶
kraken.lib.dataset module¶
kraken.lib.ctc_decoder¶
Decoders for softmax outputs of CTC trained networks.
- kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)¶
Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].
[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
outputs (ndarray)
beam_size (int)
- Returns:
A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.
- kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)¶
Translates back the network output to a label sequence as the original ocropy/clstm.
Thresholds on class 0, then assigns the maximum (non-zero) class to each region.
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
threshold (float) – Threshold for 0 class when determining possible label locations.
outputs (ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- kraken.lib.ctc_decoder.greedy_decoder(outputs)¶
Translates back the network output to a label sequence using greedy/best path decoding as described in [0].
[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
outputs (ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.