API reference¶
kraken.binarization module¶
kraken.binarization¶
An adaptive binarization algorithm.
- kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)¶
Performs binarization using non-linear processing.
- Return type:
<module ‘PIL.Image’ from ‘/opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages/PIL/Image.py’>
- Parameters:
im (PIL.Image.Image)
threshold (float)
zoom (float) – Zoom for background page estimation
escale (float) – Scale for estimating a mask over the text region
border (float) – Ignore this much of the border
perc (int) – Percentage for filters
range (int) – Range for filters
low (int) – Percentile for black estimation
high (int) – Percentile for white estimation
- Returns:
PIL.Image containing the binarized image
- Raises:
KrakenInputException when trying to binarize an empty image. –
kraken.serialization module¶
kraken.blla module¶
Note
blla provides the interface to the fully trainable segmenter. For the legacy segmenter interface refer to the pageseg module. Note that recognition models are not interchangeable between segmenters.
kraken.pageseg module¶
Note
pageseg is the legacy bounding box-based segmenter. For the trainable baseline segmenter interface refer to the blla module. Note that recognition models are not interchangeable between segmenters.
kraken.rpred module¶
kraken.transcribe module¶
Utility functions for ground truth transcription.
kraken.linegen module¶
linegen¶
An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].
Line degradation uses a local model described in [1].
[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22.11 (2000): 1209-1223.
- class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)¶
Bases:
objectProduces degraded line images using a single collection of font families.
- render_line(text)¶
Draws a line onto a Cairo surface which will be converted to an pillow Image.
- Parameters:
text (unicode) – A string which will be rendered as a single line.
- Returns:
PIL.Image of mode ‘L’.
- Raises:
KrakenCairoSurfaceException if the Cairo surface couldn't be created –
(usually caused by invalid dimensions. –
- kraken.linegen.degrade_line(im, eta=0.0, alpha=1.5, beta=1.5, alpha_0=1.0, beta_0=1.0)¶
Degrades a line image by adding noise.
For parameter meanings consult [1].
- Parameters:
im (PIL.Image) – Input image
eta (float)
alpha (float)
beta (float)
alpha_0 (float)
beta_0 (float)
- Returns:
PIL.Image in mode ‘1’
- kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)¶
Distorts a line image.
Run BEFORE degrade_line as a white border of 5 pixels will be added.
- Parameters:
im (PIL.Image) – Input image
distort (float)
sigma (float)
eps (float)
delta (float)
- Returns:
PIL.Image in mode ‘L’
- kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=(0.5, 0.0, 0.5, 0.0))¶
Degrades and distorts a line using the same noise model used by ocropus.
- Parameters:
im (PIL.Image) – Input image
distort (float)
dsigma (float)
eps (float)
delta (float)
degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.
- Returns:
PIL.Image in mode ‘L’
kraken.lib.models module¶
kraken.lib.vgsl module¶
kraken.lib.xml module¶
ALTO/Page data loaders for segmentation training
- kraken.lib.xml.parse_alto(filename)¶
Parses an ALTO file, returns the baselines defined in it, and loads the referenced image.
- Parameters:
filename (str) – path to an ALTO file.
- Returns:
impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}
- Return type:
A dict {‘image’
- kraken.lib.xml.parse_page(filename)¶
Parses a PageXML file, returns the baselines defined in it, and loads the referenced image.
- Parameters:
filename (str) – path to a PageXML file.
- Returns:
impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}}
- Return type:
A dict {‘image’
- kraken.lib.xml.parse_xml(filename)¶
Parses either a PageXML or ALTO file with autodetermination of the file format.
- Parameters:
filename (str) – path to an XML file.
- Returns:
impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}
- Return type:
A dict {‘image’
kraken.lib.codec¶
pytorch compatible codec with many-to-many mapping between labels and graphemes.
- class kraken.lib.codec.PytorchCodec(charset)¶
Bases:
objectTranslates between labels and graphemes.
- Parameters:
charset (Dict[str, Sequence[int]] | Sequence[str] | str)
- add_labels(charset)¶
Adds additional characters/labels to the codec.
charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.
As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.
- Return type:
- Parameters:
charset (unicode, list, dict) – Input character set.
- decode(labels)¶
Decodes a labelling.
Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.
- Return type:
List[Tuple[str,int,int,float]]- Parameters:
labels (list) – Input containing tuples (label, start, end, confidence).
- Returns:
A list of tuples (code point, start, end, confidence)
- Return type:
list
- encode(s)¶
Encodes a string into a sequence of labels.
- Return type:
IntTensor- Parameters:
s (str) – Input unicode string
- Returns:
(torch.IntTensor) encoded label sequence
- Raises:
KrakenEncodeException if encoding fails. –
- max_label()¶
Returns the maximum label value.
- Return type:
int
- merge(codec)¶
Transforms this codec (c1) into another (c2) reusing as many labels as possible.
The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.
- Return type:
Tuple[PytorchCodec,Set]- Parameters:
codec (kraken.lib.codec.PytorchCodec)
- Returns:
A merged codec and a list of labels that were removed from the original codec.
kraken.lib.train module¶
kraken.lib.dataset module¶
kraken.lib.segmentation module¶
Processing for baseline segmenter output
- kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)¶
Given a list of baselines and an input image, calculates a polygonal environment around each baseline.
- Parameters:
im (PIL.Image) – grayscale input image (mode ‘L’)
baselines (sequence) – List of lists containing a single baseline per entry.
suppl_obj (sequence) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.
im_feats (numpy.array) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).
scale (tuple) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.
topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards.
- Returns:
List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.
- kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)¶
Given a baseline, polygonal boundary, and two points on the baseline return the rectangle formed by the orthogonal cuts on that baseline segment. The resulting polygon is not garantueed to have a non-zero area.
The distance can be larger than the actual length of the baseline if the baseline endpoints are inside the bounding polygon. In that case the baseline will be extrapolated to the polygon edge.
- Parameters:
baseline (list) – A polyline ((x1, y1), …, (xn, yn))
boundary (list) – A bounding polygon around the baseline (same format as baseline).
dist1 (int) – Absolute distance along the baseline of the first point.
dist2 (int) – Absolute distance along the baseline of the second point.
- Returns:
A sequence of polygon points.
- kraken.lib.segmentation.extract_polygons(im, bounds)¶
Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.
- Return type:
<module ‘PIL.Image’ from ‘/opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages/PIL/Image.py’>
- Parameters:
im (PIL.Image.Image) – Input image
bounds (list) – A list of tuples (x1, y1, x2, y2)
- Yields:
(PIL.Image) the extracted subimage
- kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)¶
Given a list of baselines and regions, calculates the correct reading order and applies it to the input.
- Return type:
Sequence[Tuple[List,List]]- Parameters:
lines (Sequence) – List of tuples containing the baseline and its polygonization.
regions (Sequence) – List of region polygons.
text_direction (str) – Set principal text direction for column ordering. Can be ‘lr’ or ‘rl’
- Returns:
A reordered input.
- kraken.lib.segmentation.reading_order(lines, text_direction='lr')¶
Given the list of lines (a list of 2D slices), computes the partial reading order. The output is a binary 2D array such that order[i,j] is true if line i comes before line j in reading order.
- Return type:
List- Parameters:
lines (Sequence)
text_direction (str)
- kraken.lib.segmentation.scale_polygonal_lines(lines, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Return type:
Sequence[Tuple[List,List]]- Parameters:
lines (Sequence) – List of tuples containing the baseline and it’s polygonization.
scale (float or tuple of floats) – Scaling factor
- kraken.lib.segmentation.scale_regions(regions, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Return type:
Sequence[Tuple[List,List]]- Parameters:
lines (Sequence) – List of tuples containing the baseline and it’s polygonization.
scale (float or tuple of floats) – Scaling factor
regions (Sequence[Tuple[List, List]])
- kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)¶
Vectorizes lines from a binarized array.
- Parameters:
im (np.ndarray) – Array of shape (3, H, W) with the first dimension being probabilities for (start_separators, end_separators, baseline).
threshold (float) – Threshold for baseline blob detection.
min_length (int) – Minimal length of output baselines.
- Returns:
[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the points of all baseline polylines.
kraken.lib.ctc_decoder¶
Decoders for softmax outputs of CTC trained networks.
- kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)¶
Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].
[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
outputs (ndarray)
beam_size (int)
- Returns:
A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.
- kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)¶
Translates back the network output to a label sequence as the original ocropy/clstm.
Thresholds on class 0, then assigns the maximum (non-zero) class to each region.
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
threshold (float) – Threshold for 0 class when determining possible label locations.
outputs (ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- kraken.lib.ctc_decoder.greedy_decoder(outputs)¶
Translates back the network output to a label sequence using greedy/best path decoding as described in [0].
[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
- Return type:
List[Tuple[int,int,int,float]]- Parameters:
output (numpy.array) – (C, W) shaped softmax output tensor
outputs (ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.