API reference

kraken.blla module

Note

blla provides the interface to the fully trainable segmenter. For the legacy segmenter interface refer to the pageseg module. Note that recognition models are not interchangeable between segmenters.

kraken.blla

Trainable baseline layout analysis tools for kraken

exception kraken.blla.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

exception kraken.blla.KrakenInvalidModelException(message=None)

Common base class for all non-exit exceptions.

kraken.blla.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)

Given a list of baselines and an input image, calculates a polygonal environment around each baseline.

Parameters
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • baselines (sequence) – List of lists containing a single baseline per entry.

  • suppl_obj (sequence) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.

  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).

  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.

  • topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards. If set to None, no offset will be applied.

Returns

List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.

kraken.blla.compute_segmentation_map(im, mask=None, model=None, device='cpu')
Parameters
  • mask (Optional[numpy.ndarray]) –

  • device (str) –

kraken.blla.get_im_str(im)
Parameters

im (PIL.Image.Image) –

Return type

str

kraken.blla.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters

im (PIL.Image.Image) – Image to test

Returns

True if the image contains only two different color values. False otherwise.

Return type

bool

kraken.blla.logger
kraken.blla.pil2array(im, alpha=0)
Parameters
  • im (PIL.Image.Image) –

  • alpha (int) –

Return type

numpy.ndarray

kraken.blla.polygonal_reading_order(lines, text_direction='lr', regions=None)

Given a list of baselines and regions, calculates the correct reading order and applies it to the input.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and its polygonization.

  • regions (Sequence) – List of region polygons.

  • text_direction (str) – Set principal text direction for column ordering. Can be ‘lr’ or ‘rl’

Returns

A reordered input.

Return type

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

kraken.blla.scale_polygonal_lines(lines, scale)

Scales baselines/polygon coordinates by a certain factor.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and it’s polygonization.

  • scale (float or tuple of floats) – Scaling factor

Return type

Sequence[Tuple[List, List]]

kraken.blla.scale_regions(regions, scale)

Scales baselines/polygon coordinates by a certain factor.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and it’s polygonization.

  • scale (float or tuple of floats) – Scaling factor

  • regions (Sequence[Tuple[List[int], List[int]]]) –

Return type

Sequence[Tuple[List, List]]

kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')

Segments a page into text lines using the baseline segmenter.

Segments a page into text lines and returns the polyline formed by each baseline and their estimated environment.

Parameters
  • im (PIL.Image) – An RGB image.

  • text_direction (str) – Ignored by the segmenter but kept for serialization.

  • mask (PIL.Image) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.

  • reading_order_fn (function) – Function to determine the reading order. Has to accept a list of tuples (baselines, polygon) and a text direction (lr or rl).

  • model (vgsl.TorchVGSLModel or list) – One or more TorchVGSLModel containing a segmentation model. If none is given a default model will be loaded.

  • device (str or torch.Device) – The target device to run the neural network on.

Returns

‘$dir’,

’type’: ‘baseline’, ‘lines’: [

{‘baseline’: [[x0, y0], [x1, y1], …, [x_n, y_n]], ‘boundary’: [[x0, y0, x1, y1], … [x_m, y_m]]}, {‘baseline’: [[x0, …]], ‘boundary’: [[x0, …]]}

] ‘regions’: [

{‘region’: [[x0, y0], [x1, y1], …, [x_n, y_n]], ‘type’: ‘image’}, {‘region’: [[x0, …]], ‘type’: ‘text’}

]

}: A dictionary containing the text direction and under the key ‘lines’ a list of reading order sorted baselines (polylines) and their respective polygonal boundaries. The last and first point of each boundary polygon is connected.

Return type

{‘text_direction’

Raises
  • KrakenInputException if the input image is not binarized or the text

  • direction is invalid.

kraken.blla.vec_lines(heatmap, cls_map, scale, text_direction='horizontal-lr', reading_order_fn=polygonal_reading_order, regions=None, scal_im=None, suppl_obj=None, topline=False, **kwargs)

Computes lines from a stack of heatmaps, a class mapping, and scaling factor.

Parameters
  • heatmap (torch.Tensor) –

  • cls_map (Dict) –

  • scale (float) –

  • text_direction (str) –

  • reading_order_fn (Callable) –

  • regions (List) –

  • scal_im (numpy.ndarray) –

  • suppl_obj (List) –

  • topline (bool) –

kraken.blla.vec_regions(heatmap, cls_map, scale, **kwargs)

Computes regions from a stack of heatmaps, a class mapping, and scaling factor.

Parameters
  • heatmap (torch.Tensor) –

  • cls_map (Dict) –

  • scale (float) –

kraken.blla.vectorize_lines(im, threshold=0.17, min_length=5)

Vectorizes lines from a binarized array.

Parameters
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension being probabilities for (start_separators, end_separators, baseline).

  • threshold (float) – Threshold for baseline blob detection.

  • min_length (int) – Minimal length of output baselines.

Returns

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the points of all baseline polylines.

kraken.blla.vectorize_regions(im, threshold=0.5)

Vectorizes lines from a binarized array.

Parameters
  • im (np.ndarray) – Array of shape (H, W) with the first dimension being a probability distribution over the region.

  • threshold (float) – Threshold for binarization

Returns

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the region polygons.

kraken.pageseg module

Note

pageseg is the legacy bounding box-based segmenter. For the trainable baseline segmenter interface refer to the blla module. Note that recognition models are not interchangeable between segmenters.

kraken.pageseg

Layout analysis and script detection methods.

exception kraken.pageseg.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

kraken.pageseg.binary_objects(binary)

Labels features in an array and segments them into objects.

Parameters

binary (numpy.ndarray) –

Return type

numpy.ndarray

kraken.pageseg.compute_black_colseps(binary, scale, maxcolseps)

Computes column separators from vertical black lines.

Parameters
  • binary (numpy.ndarray) – Numpy array of the binary image

  • scale (float) –

  • maxcolseps (int) –

Returns

Return type

(colseps, binary)

kraken.pageseg.compute_boxmap(binary, scale, threshold=(0.5, 4), dtype='i')

Returns grapheme cluster-like boxes based on connected components.

Parameters
  • binary (numpy.ndarray) –

  • scale (float) –

  • threshold (Tuple[float, int]) –

  • dtype (str) –

Return type

numpy.ndarray

kraken.pageseg.compute_colseps_conv(binary, scale=1.0, minheight=10, maxcolseps=2)

Find column separators by convolution and thresholding.

Parameters
  • binary (numpy.ndarray) –

  • scale (float) –

  • minheight (int) –

  • maxcolseps (int) –

Returns

Separators

Return type

numpy.ndarray

kraken.pageseg.compute_gradmaps(binary, scale, gauss=False)

Use gradient filtering to find baselines

Parameters
  • binary (numpy.ndarray) –

  • scale (float) –

  • gauss (bool) – Use gaussian instead of uniform filtering

Returns

(bottom, top, boxmap)

kraken.pageseg.compute_line_seeds(binary, bottom, top, colseps, scale, threshold=0.2)

Base on gradient maps, computes candidates for baselines and xheights. Then, it marks the regions between the two as a line seed.

Parameters
  • binary (numpy.ndarray) –

  • bottom (numpy.ndarray) –

  • top (numpy.ndarray) –

  • colseps (numpy.ndarray) –

  • scale (float) –

  • threshold (float) –

Return type

numpy.ndarray

kraken.pageseg.compute_lines(segmentation, scale)

Given a line segmentation map, computes a list of tuples consisting of 2D slices and masked images.

Parameters
  • segmentation (numpy.ndarray) –

  • scale (float) –

Return type

List[record]

kraken.pageseg.compute_separators_morph(binary, scale, sepwiden=10, maxcolseps=2)

Finds vertical black lines corresponding to column separators.

Parameters
  • binary (numpy.ndarray) –

  • scale (float) –

  • sepwiden (int) –

  • maxcolseps (int) –

Return type

numpy.ndarray

kraken.pageseg.compute_white_colseps(binary, scale, maxcolseps)

Computes column separators either from vertical black lines or whitespace.

Parameters
  • binary (numpy.ndarray) – Numpy array of the binary image

  • scale (float) –

  • maxcolseps (int) –

Returns

Return type

colseps

kraken.pageseg.estimate_scale(binary)

Estimates image scale based on number of connected components.

Parameters

binary (numpy.ndarray) –

Return type

float

kraken.pageseg.find(condition)

Return the indices where ravel(condition) is true

kraken.pageseg.get_im_str(im)
Parameters

im (PIL.Image.Image) –

Return type

str

kraken.pageseg.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters

im (PIL.Image.Image) – Image to test

Returns

True if the image contains only two different color values. False otherwise.

Return type

bool

kraken.pageseg.logger
kraken.pageseg.norm_max(v)

Normalizes the input array by maximum value.

Parameters

v (numpy.ndarray) –

Return type

numpy.ndarray

kraken.pageseg.pil2array(im, alpha=0)
Parameters
  • im (PIL.Image.Image) –

  • alpha (int) –

Return type

numpy.ndarray

kraken.pageseg.reading_order(lines, text_direction='lr')

Given the list of lines (a list of 2D slices), computes the partial reading order. The output is a binary 2D array such that order[i,j] is true if line i comes before line j in reading order.

Parameters
  • lines (Sequence[Tuple[slice, slice]]) –

  • text_direction (str) –

Return type

numpy.ndarray

class kraken.pageseg.record(**kw)

Simple dict-like object.

kraken.pageseg.remove_hlines(binary, scale, maxsize=10)

Removes horizontal black lines that only interfere with page segmentation.

Args:

binary (numpy.ndarray): scale (float): maxsize (int): maximum size of removed lines

Returns:

numpy.ndarray containing the filtered image.

Parameters
  • binary (numpy.ndarray) –

  • scale (float) –

  • maxsize (int) –

Return type

numpy.ndarray

kraken.pageseg.rotate_lines(lines, angle, offset)

Rotates line bounding boxes around the origin and adding and offset.

Parameters
  • lines (numpy.ndarray) –

  • angle (float) –

  • offset (int) –

Return type

numpy.ndarray

kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters
  • im (PIL.Image) – A bi-level page of mode ‘1’ or ‘L’

  • text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)

  • scale (float) – Scale of the image

  • maxcolseps (int) – Maximum number of whitespace column separators

  • black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not

  • no_hlines (bool) – Switch for horizontal line removal

  • pad (int or tuple) – Padding to add to line bounding boxes. If int the same padding is used both left and right. If a 2-tuple, uses (padding_left, padding_right).

  • mask (PIL.Image) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.

  • reading_order_fn (Callable) – Function to call to order line output. Callable accepting a list of slices (y, x) and a text direction in (rl, lr).

Returns

‘$dir’, ‘boxes’: [(x1, y1, x2, y2),…]}: A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’.

Return type

{‘text_direction’

Raises
  • KrakenInputException if the input image is not binarized or the text

  • direction is invalid.

kraken.pageseg.topsort(order)

Given a binary array defining a partial order (o[i,j]==True means i<j), compute a topological sort. This is a quick and dirty implementation that works for up to a few thousand elements.

Parameters

order (numpy.ndarray) –

Return type

List[int]

kraken.rpred module

kraken.rpred

Generators for recognition on lines images.

exception kraken.rpred.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

class kraken.rpred.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')

A class wrapping a TorchVGSLModel with a more comfortable recognition interface.

Parameters
  • train (bool) –

  • device (str) –

forward(self, line, lens=None)

Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

Tuple with (N, W, C) shaped numpy array and final output sequence lengths.

Return type

numpy.ndarray

predict(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

List of decoded sequences.

Return type

List[List[Tuple[str, int, int, float]]]

predict_labels(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.

Parameters
  • line (torch.tensor) –

  • lens (torch.Tensor) –

Return type

List[List[Tuple[int, int, int, float]]]

predict_string(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.

Parameters
  • line (torch.Tensor) –

  • lens (torch.Tensor) –

Return type

List[str]

to(self, device)

Moves model to device and automatically loads input tensors onto it.

kraken.rpred.bidi_record(record, base_dir=None)

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters

record (kraken.rpred.ocr_record) –

Returns

kraken.rpred.ocr_record

Return type

ocr_record

kraken.rpred.compute_polygon_section(baseline, boundary, dist1, dist2)

Given a baseline, polygonal boundary, and two points on the baseline return the rectangle formed by the orthogonal cuts on that baseline segment. The resulting polygon is not garantueed to have a non-zero area.

The distance can be larger than the actual length of the baseline if the baseline endpoints are inside the bounding polygon. In that case the baseline will be extrapolated to the polygon edge.

Parameters
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • boundary (list) – A bounding polygon around the baseline (same format as baseline).

  • dist1 (int) – Absolute distance along the baseline of the first point.

  • dist2 (int) – Absolute distance along the baseline of the second point.

Returns

A sequence of polygon points.

kraken.rpred.extract_polygons(im, bounds)

Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.

Parameters
  • im (PIL.Image.Image) – Input image

  • bounds (list) – A list of tuples (x1, y1, x2, y2)

Yields

(PIL.Image.Image) the extracted subimage

Return type

PIL.Image.Image

kraken.rpred.generate_input_transforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)

Generates a torchvision transformation converting a PIL.Image into a tensor usable in a network forward pass.

Parameters
  • batch (int) – mini-batch size

  • height (int) – height of input image in pixels

  • width (int) – width of input image in pixels

  • channels (int) – color channels of input

  • pad (int) – Amount of padding on horizontal ends of image

  • valid_norm (bool) – Enables/disables baseline normalization as a valid preprocessing step. If disabled we will fall back to standard scaling.

  • force_binarization (bool) – Forces binarization of input images using the nlbin algorithm.

Returns

A torchvision transformation composition converting the input image to the appropriate tensor.

Return type

torchvision.transforms.Compose

kraken.rpred.get_im_str(im)
Parameters

im (PIL.Image.Image) –

Return type

str

kraken.rpred.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters

im (PIL.Image.Image) – Image to test

Returns

True if the image contains only two different color values. False otherwise.

Return type

bool

kraken.rpred.logger
class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, script_ignore=None)

Multi-model version of kraken.rpred.rpred

Parameters
  • nets (Dict[str, kraken.lib.models.TorchSeqRecognizer]) –

  • im (PIL.Image.Image) –

  • bounds (dict) –

  • pad (int) –

  • bidi_reordering (Union[bool, str]) –

  • script_ignore (Optional[List[str]]) –

class kraken.rpred.ocr_record(prediction, cuts, confidences, line)

A record object containing the recognition result of a single line

Parameters
  • prediction (str) –

  • confidences (List[float]) –

  • line (Union[List, Dict[str, List]]) –

kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)

Uses a TorchSeqRecognizer and a segmentation to recognize text

Parameters
  • network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object

  • im (PIL.Image.Image) – Image to extract text from

  • bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.

  • pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.

  • bidi_reordering (bool|str) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display. Set to L|R to change base text direction.

Yields

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

Return type

Generator[ocr_record, None, None]

kraken.serialization module

kraken.serialization.is_in_region(line, region)

Tests if a line is inside a region, i.e. if the mid point of the baseline is inside the region.

Parameters
  • line (geom.LineString) – line to test

  • region (geom.Polygon) –

Returns

False if line is not inside region, True otherwise

Return type

bool

kraken.serialization.logger
kraken.serialization.make_printable(char)

Takes a Unicode code point and return a printable representation of it.

Parameters

char (str) – Input code point

Returns

Either the original code point, the name of the code point if it is a combining mark, whitespace etc., or the hex code if it is a control symbol.

Return type

str

kraken.serialization.max_bbox(boxes)

Calculates the minimal bounding box containing all contained in an iterator.

Parameters
  • boxes (iterator) – An iterator returning tuples of the format ((x0, y0),

  • (x1 (xn, yn)) –

  • y1) (xn, yn)) –

  • ... (xn, yn)) –

Returns

A box (x0, y0, x1, y1) covering all bounding boxes in the input argument.

Return type

Tuple[int, int, int, int]

class kraken.serialization.ocr_record(prediction, cuts, confidences, line)

A record object containing the recognition result of a single line

Parameters
  • prediction (str) –

  • confidences (List[float]) –

  • line (Union[List, Dict[str, List]]) –

kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)

Renders an accuracy report.

Parameters
  • model (str) – Model name.

  • errors (int) – Number of errors on test set.

  • char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a number of occurrences.

  • scripts (dict) – Dictionary counting character per script.

  • insertions (dict) – Dictionary counting insertion operations per Unicode script

  • deletions (int) – Number of deletions

  • substitutions (dict) – Dictionary counting substitution operations per Unicode script.

  • chars (int) –

Returns

A string containing the rendered report.

Return type

str

kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr')

Serializes a list of ocr_records into an output document.

Serializes a list of predictions and their corresponding positions by doing some hOCR-specific preprocessing and then renders them through one of several jinja2 templates.

Note: Empty records are ignored for serialization purposes.

Parameters
  • records (iterable) – List of kraken.rpred.ocr_record

  • image_name (str) – Name of the source image

  • image_size (tuple) – Dimensions of the source image

  • writing_mode (str) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.

  • scripts (list) – List of scripts contained in the OCR records

  • regions (list) – Dictionary mapping region types to a list of region polygons.

  • template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.

Returns

(str) rendered template.

Return type

str

kraken.lib.models module

kraken.lib.models

Wrapper around TorchVGSLModel including a variety of forward pass helpers for sequence classification.

exception kraken.lib.models.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

exception kraken.lib.models.KrakenInvalidModelException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')

A class wrapping a TorchVGSLModel with a more comfortable recognition interface.

Parameters
  • train (bool) –

  • device (str) –

forward(self, line, lens=None)

Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

Tuple with (N, W, C) shaped numpy array and final output sequence lengths.

Return type

numpy.ndarray

predict(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

List of decoded sequences.

Return type

List[List[Tuple[str, int, int, float]]]

predict_labels(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.

Parameters
  • line (torch.tensor) –

  • lens (torch.Tensor) –

Return type

List[List[Tuple[int, int, int, float]]]

predict_string(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.

Parameters
  • line (torch.Tensor) –

  • lens (torch.Tensor) –

Return type

List[str]

to(self, device)

Moves model to device and automatically loads input tensors onto it.

class kraken.lib.models.TorchVGSLModel(spec)

Class building a torch module from a VSGL spec.

The initialized class will contain a variable number of layers and a loss function. Inputs and outputs are always 4D tensors in order (batch, channels, height, width) with channels always being the feature dimension.

Importantly this means that a recurrent network will be fed the channel vector at each step along its time axis, i.e. either put the non-time-axis dimension into the channels dimension or use a summarizing RNN squashing the time axis to 1 and putting the output into the channels dimension respectively.

Parameters

spec (str) –

input

Expected input tensor as a 4-tuple.

Type

tuple

nn

Stack of layers parsed from the spec.

Type

torch.nn.Sequential

criterion

Fully parametrized loss function.

Type

torch.nn.Module

user_metdata

dict with user defined metadata. Is flushed into model file during saving/overwritten by loading operations.

Type

dict

one_channel_mode

Field indicating the image type used during training of one-channel images. Is ‘1’ for models trained on binarized images, ‘L’ for grayscale, and None otherwise.

Type

str

add_codec(self, codec)

Adds a PytorchCodec to the model.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Return type

None

append(self, idx, spec)

Splits a model at layer idx and append layers spec.

New layers are initialized using the init_weights method.

Parameters
  • idx (int) – Index of layer to append spec to starting with 1. To select the whole layer stack set idx to None.

  • spec (str) – VGSL spec without input block to append to model.

Return type

None

build_conv(self, input, block)

Builds a 2D convolution layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_dropout(self, input, block)
Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_groupnorm(self, input, block)
Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_maxpool(self, input, block)

Builds a maxpool layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_output(self, input, block)

Builds an output layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_reshape(self, input, block)

Builds a reshape layer

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_rnn(self, input, block)

Builds an LSTM/GRU layer returning number of outputs and layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

eval(self)

Sets the model to evaluation/inference mode, disabling dropout and gradient calculation.

Return type

None

get_layer_name(self, layer, name=None)

Generates a unique identifier for the layer optionally using a supplied name.

Parameters
  • layer (str) – Identifier of the layer type

  • name (str) – user-supplied {name} with {} that need removing.

Returns

(str) network unique layer name

Return type

str

property hyper_params(self, **kwargs)
init_weights(self, idx=slice(0, None))

Initializes weights for all or a subset of layers in the graph.

LSTM/GRU layers are orthogonally initialized, convolutional layers uniformly from (-0.1,0.1).

Parameters

idx (slice) – A slice object representing the indices of layers to initialize.

Return type

None

classmethod load_clstm_model(cls, path)

Loads an CLSTM model to VGSL.

Parameters

path (str) –

classmethod load_model(cls, path)

Deserializes a VGSL model from a CoreML file.

Parameters

path (str) – CoreML file

Returns

A TorchVGSLModel instance.

Raises
  • KrakenInvalidModelException if the model data is invalid (not a

  • string, protobuf file, or without appropriate metadata)

  • FileNotFoundError if the path doesn't point to a file.

classmethod load_pronn_model(cls, path)

Loads an pronn model to VGSL.

Parameters

path (str) –

property model_type(self)
property one_channel_mode(self)
resize_output(self, output_size, del_indices=None)

Resizes an output layer.

Parameters
  • output_size (int) – New size/output channels of last layer

  • del_indices (list) – list of outputs to delete from layer

Return type

None

save_model(self, path)

Serializes the model into path.

Parameters

path (str) – Target destination

property seg_type(self)
static set_layer_name(layer, name)

Sets the name field of an VGSL layer definition.

Parameters
  • layer (str) – VGSL definition

  • name (str) – Layer name

Return type

str

set_num_threads(self, num)

Sets number of OpenMP threads to use.

Parameters

num (int) –

Return type

None

to(self, device)
Parameters

device (Union[str, torch.device]) –

Return type

None

train(self)

Sets the model to training mode (enables dropout layers and disables softmax on CTC layers).

Return type

None

kraken.lib.models.load_any(fname, train=False, device='cpu')

Loads anything that was, is, and will be a valid ocropus model and instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN configuration in the file.

Currently it recognizes the following kinds of models:

  • pyrnn models containing BIDILSTMs

  • protobuf models containing converted python BIDILSTMs

  • protobuf models containing CLSTM networks

Additionally an attribute ‘kind’ will be added to the SeqRecognizer containing a string representation of the source kind. Current known values are:

  • pyrnn for pickled BIDILSTMs

  • clstm for protobuf models generated by clstm

Parameters
  • fname (str) – Path to the model

  • train (bool) – Enables gradient calculation and dropout layers in model.

  • device (str) – Target device

Returns

A kraken.lib.models.TorchSeqRecognizer object.

Return type

TorchSeqRecognizer

kraken.lib.models.logger
kraken.lib.models.validate_hyper_parameters(hyper_params)

Validate some model’s hyper parameters and modify them in place if need be.

kraken.lib.vgsl module

VGSL plumbing

exception kraken.lib.vgsl.KrakenInvalidModelException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.vgsl.PytorchCodec(charset, strict=False)

Translates between labels and graphemes.

Parameters

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) –

add_labels(self, charset)

Adds additional characters/labels to the codec.

charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.

As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.

Parameters

charset (unicode, list, dict) – Input character set.

Return type

PytorchCodec

decode(self, labels)

Decodes a labelling.

Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.

Parameters

labels (list) – Input containing tuples (label, start, end, confidence).

Returns

A list of tuples (code point, start, end, confidence)

Return type

list

encode(self, s)

Encodes a string into a sequence of labels.

If the code is non-singular we greedily encode the longest sequence first.

Parameters

s (str) – Input unicode string

Returns

(torch.IntTensor) encoded label sequence

Return type

torch.IntTensor

Raises:

property is_valid(self)

Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).

Return type

bool

property max_label(self)

Returns the maximum label value.

Return type

int

merge(self, codec)

Transforms this codec (c1) into another (c2) reusing as many labels as possible.

The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Returns

A merged codec and a list of labels that were removed from the original codec.

Return type

Tuple[PytorchCodec, Set]

class kraken.lib.vgsl.TorchVGSLModel(spec)

Class building a torch module from a VSGL spec.

The initialized class will contain a variable number of layers and a loss function. Inputs and outputs are always 4D tensors in order (batch, channels, height, width) with channels always being the feature dimension.

Importantly this means that a recurrent network will be fed the channel vector at each step along its time axis, i.e. either put the non-time-axis dimension into the channels dimension or use a summarizing RNN squashing the time axis to 1 and putting the output into the channels dimension respectively.

Parameters

spec (str) –

input

Expected input tensor as a 4-tuple.

Type

tuple

nn

Stack of layers parsed from the spec.

Type

torch.nn.Sequential

criterion

Fully parametrized loss function.

Type

torch.nn.Module

user_metdata

dict with user defined metadata. Is flushed into model file during saving/overwritten by loading operations.

Type

dict

one_channel_mode

Field indicating the image type used during training of one-channel images. Is ‘1’ for models trained on binarized images, ‘L’ for grayscale, and None otherwise.

Type

str

add_codec(self, codec)

Adds a PytorchCodec to the model.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Return type

None

append(self, idx, spec)

Splits a model at layer idx and append layers spec.

New layers are initialized using the init_weights method.

Parameters
  • idx (int) – Index of layer to append spec to starting with 1. To select the whole layer stack set idx to None.

  • spec (str) – VGSL spec without input block to append to model.

Return type

None

build_conv(self, input, block)

Builds a 2D convolution layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_dropout(self, input, block)
Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_groupnorm(self, input, block)
Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_maxpool(self, input, block)

Builds a maxpool layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_output(self, input, block)

Builds an output layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_reshape(self, input, block)

Builds a reshape layer

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

build_rnn(self, input, block)

Builds an LSTM/GRU layer returning number of outputs and layer.

Parameters
  • input (Tuple[int, int, int, int]) –

  • block (str) –

Return type

Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]

eval(self)

Sets the model to evaluation/inference mode, disabling dropout and gradient calculation.

Return type

None

get_layer_name(self, layer, name=None)

Generates a unique identifier for the layer optionally using a supplied name.

Parameters
  • layer (str) – Identifier of the layer type

  • name (str) – user-supplied {name} with {} that need removing.

Returns

(str) network unique layer name

Return type

str

property hyper_params(self, **kwargs)
init_weights(self, idx=slice(0, None))

Initializes weights for all or a subset of layers in the graph.

LSTM/GRU layers are orthogonally initialized, convolutional layers uniformly from (-0.1,0.1).

Parameters

idx (slice) – A slice object representing the indices of layers to initialize.

Return type

None

classmethod load_clstm_model(cls, path)

Loads an CLSTM model to VGSL.

Parameters

path (str) –

classmethod load_model(cls, path)

Deserializes a VGSL model from a CoreML file.

Parameters

path (str) – CoreML file

Returns

A TorchVGSLModel instance.

Raises
  • KrakenInvalidModelException if the model data is invalid (not a

  • string, protobuf file, or without appropriate metadata)

  • FileNotFoundError if the path doesn't point to a file.

classmethod load_pronn_model(cls, path)

Loads an pronn model to VGSL.

Parameters

path (str) –

property model_type(self)
property one_channel_mode(self)
resize_output(self, output_size, del_indices=None)

Resizes an output layer.

Parameters
  • output_size (int) – New size/output channels of last layer

  • del_indices (list) – list of outputs to delete from layer

Return type

None

save_model(self, path)

Serializes the model into path.

Parameters

path (str) – Target destination

property seg_type(self)
static set_layer_name(layer, name)

Sets the name field of an VGSL layer definition.

Parameters
  • layer (str) – VGSL definition

  • name (str) – Layer name

Return type

str

set_num_threads(self, num)

Sets number of OpenMP threads to use.

Parameters

num (int) –

Return type

None

to(self, device)
Parameters

device (Union[str, torch.device]) –

Return type

None

train(self)

Sets the model to training mode (enables dropout layers and disables softmax on CTC layers).

Return type

None

kraken.lib.vgsl.logger

kraken.lib.xml module

ALTO/Page data loaders for segmentation training

exception kraken.lib.xml.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

kraken.lib.xml.alto_regions
kraken.lib.xml.logger
kraken.lib.xml.page_regions
kraken.lib.xml.parse_alto(filename)

Parses an ALTO file, returns the baselines defined in it, and loads the referenced image.

Parameters

filename (str) – path to an ALTO file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}

Return type

A dict {‘image’

kraken.lib.xml.parse_page(filename)

Parses a PageXML file, returns the baselines defined in it, and loads the referenced image.

Parameters

filename (str) – path to a PageXML file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}}

Return type

A dict {‘image’

kraken.lib.xml.parse_xml(filename)

Parses either a PageXML or ALTO file with autodetermination of the file format.

Parameters

filename (str) – path to an XML file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}

Return type

A dict {‘image’

kraken.lib.codec

pytorch compatible codec with many-to-many mapping between labels and graphemes.

exception kraken.lib.codec.KrakenCodecException(message=None)

Common base class for all non-exit exceptions.

exception kraken.lib.codec.KrakenEncodeException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.codec.PytorchCodec(charset, strict=False)

Translates between labels and graphemes.

Parameters

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) –

add_labels(self, charset)

Adds additional characters/labels to the codec.

charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.

As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.

Parameters

charset (unicode, list, dict) – Input character set.

Return type

PytorchCodec

decode(self, labels)

Decodes a labelling.

Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.

Parameters

labels (list) – Input containing tuples (label, start, end, confidence).

Returns

A list of tuples (code point, start, end, confidence)

Return type

list

encode(self, s)

Encodes a string into a sequence of labels.

If the code is non-singular we greedily encode the longest sequence first.

Parameters

s (str) – Input unicode string

Returns

(torch.IntTensor) encoded label sequence

Return type

torch.IntTensor

Raises:

property is_valid(self)

Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).

Return type

bool

property max_label(self)

Returns the maximum label value.

Return type

int

merge(self, codec)

Transforms this codec (c1) into another (c2) reusing as many labels as possible.

The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Returns

A merged codec and a list of labels that were removed from the original codec.

Return type

Tuple[PytorchCodec, Set]

kraken.lib.codec.logger

kraken.lib.train module

Training loop interception helpers

class kraken.lib.train.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)

Dataset for training a baseline/region segmentation model.

Parameters
  • imgs (Sequence[str]) –

  • suffix (str) –

  • line_width (int) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • mode (str) –

  • augmentation (bool) –

  • valid_baselines (Sequence[str]) –

  • merge_baselines (Dict[str, Sequence[str]]) –

  • valid_regions (Sequence[str]) –

  • merge_regions (Dict[str, Sequence[str]]) –

add(self, image, baselines=None, regions=None, *args, **kwargs)

Adds a page to the dataset.

Parameters
  • im (path) – Path to the whole page image

  • baseline (dict) – A list containing dicts with a list of coordinates and script types [{‘baseline’: [[x0, y0], …, [xn, yn]], ‘script’: ‘script_type’}, …]

  • regions (dict) – A dict containing list of lists of coordinates {‘region_type_0’: [[x0, y0], …, [xn, yn]]], ‘region_type_1’: …}.

  • image (Union[str, PIL.Image.Image]) –

  • baselines (List[List[List[Tuple[int, int]]]]) –

transform(self, image, target)
class kraken.lib.train.EarlyStopping(min_delta=None, lag=1000)

Early stopping to terminate training when validation loss doesn’t improve over a certain time.

Parameters
  • min_delta (float) –

  • lag (int) –

trigger(self)

Function that raises a KrakenStopTrainingException after if the abort condition is fulfilled.

Return type

bool

update(self, val_loss)

Updates the internal validation loss state and increases counter by one.

Parameters

val_loss (torch.float) –

Return type

None

class kraken.lib.train.EpochStopping(epochs)

Dumb stopping after a fixed number of iterations.

Parameters

epochs (int) –

trigger(self)

Function that raises a KrakenStopTrainingException after if the abort condition is fulfilled.

Return type

bool

update(self, val_loss)

Only update internal best iteration

Parameters

val_loss (torch.float) –

Return type

None

class kraken.lib.train.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), preload=True, augmentation=False)

Dataset for training a line recognition model.

All data is cached in memory.

Parameters
  • split (Callable[[str], str]) –

  • suffix (str) –

  • normalization (Optional[str]) –

  • whitespace_normalization (bool) –

  • reorder (Union[bool, str]) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • preload (bool) –

  • augmentation (bool) –

add(self, *args, **kwargs)

Adds a line-image-text pair to the dataset.

Parameters

image (str) – Input image path

Return type

None

add_loaded(self, image, gt)

Adds an already loaded line-image-text pair to the dataset.

Parameters
  • image (PIL.Image.Image) – Line image

  • gt (str) – Text contained in the line image

Return type

None

encode(self, codec=None)

Adds a codec to the dataset and encodes all text lines.

Has to be run before sampling from the dataset.

Parameters

codec (Optional[kraken.lib.codec.PytorchCodec]) –

Return type

None

no_encode(self)

Creates an unencoded dataset.

Return type

None

parse(self, image, *args, **kwargs)

Parses a sample for this dataset.

This is mostly used to parallelize populating the dataset.

Parameters

image (str) – Input image path

Return type

Dict

class kraken.lib.train.InfiniteDataLoader(*args, **kwargs)

Version of DataLoader that auto-reinitializes the iterator once it is exhausted.

exception kraken.lib.train.KrakenEncodeException(message=None)

Common base class for all non-exit exceptions.

exception kraken.lib.train.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.train.KrakenTrainer(model, optimizer, device='cpu', filename_prefix='model', event_frequency=1.0, train_set=None, val_set=None, stopper=None, loss_fn=recognition_loss_fn, evaluator=recognition_evaluator_fn)

Class encapsulating the recognition model training process.

Parameters
  • model (kraken.lib.vgsl.TorchVGSLModel) –

  • optimizer (torch.optim.Optimizer) –

  • device (str) –

  • filename_prefix (str) –

  • event_frequency (float) –

  • train_set (torch.utils.data.DataLoader) –

add_lr_scheduler(self, lr_scheduler)
Parameters

lr_scheduler (TrainScheduler) –

classmethod load_model(cls, model_path, load_hyper_parameters=False, message=lambda *args, **kwargs: ...)
Parameters
  • model_path (str) –

  • load_hyper_parameters (Optional[bool]) –

  • message (Callable[[str], None]) –

classmethod recognition_train_gen(cls, hyper_params=None, progress_callback=lambda string, length: ..., message=lambda *args, **kwargs: ..., output='model', spec=default_specs.RECOGNITION_SPEC, append=None, load=None, device='cpu', reorder=True, training_data=None, evaluation_data=None, preload=None, threads=1, load_hyper_parameters=False, repolygonize=False, force_binarization=False, format_type='path', codec=None, resize='fail', augment=False)

This is an ugly constructor that takes all the arguments from the command line driver, finagles the datasets, models, and hyperparameters correctly and returns a KrakenTrainer object.

Setup parameters (load, training_data, evaluation_data, ….) are named, model hyperparameters (everything in kraken.lib.default_specs.RECOGNITION_HYPER_PARAMS) are in in the hyper_params argument.

Parameters
  • hyper_params (dict) – Hyperparameter dictionary containing all fields from kraken.lib.default_specs.RECOGNITION_HYPER_PARAMS

  • progress_callback (Callable) – Callback for progress reports on various computationally expensive processes. A human readable string and the process length is supplied. The callback has to return another function which will be executed after each step.

  • message (Callable) – Messaging printing method for above log but below warning level output, i.e. infos that should generally be shown to users.

  • **kwargs – Setup parameters, i.e. CLI parameters of the train() command.

  • output (str) –

  • spec (str) –

  • append (Optional[int]) –

  • load (Optional[str]) –

  • device (str) –

  • reorder (Union[bool, str]) –

  • training_data (Sequence[Dict]) –

  • evaluation_data (Sequence[Dict]) –

  • preload (Optional[bool]) –

  • threads (int) –

  • load_hyper_parameters (bool) –

  • repolygonize (bool) –

  • force_binarization (bool) –

  • format_type (str) –

  • codec (Optional[Dict]) –

  • resize (str) –

  • augment (bool) –

Returns

A KrakenTrainer object.

run(self, event_callback=lambda *args, **kwargs: ..., iteration_callback=lambda *args, **kwargs: ...)
classmethod segmentation_train_gen(cls, hyper_params=None, load_hyper_parameters=False, progress_callback=lambda string, length: ..., message=lambda *args, **kwargs: ..., output='model', spec=default_specs.SEGMENTATION_SPEC, load=None, device='cpu', training_data=None, evaluation_data=None, threads=1, force_binarization=False, format_type='path', suppress_regions=False, suppress_baselines=False, valid_regions=None, valid_baselines=None, merge_regions=None, merge_baselines=None, bounding_regions=None, resize='fail', augment=False, topline=False)

This is an ugly constructor that takes all the arguments from the command line driver, finagles the datasets, models, and hyperparameters correctly and returns a KrakenTrainer object.

Setup parameters (load, training_data, evaluation_data, ….) are named, model hyperparameters (everything in kraken.lib.default_specs.SEGMENTATION_HYPER_PARAMS) are in in the hyper_params argument.

Parameters
  • hyper_params (dict) – Hyperparameter dictionary containing all fields from kraken.lib.default_specs.SEGMENTATION_HYPER_PARAMS

  • progress_callback (Callable) – Callback for progress reports on various computationally expensive processes. A human readable string and the process length is supplied. The callback has to return another function which will be executed after each step.

  • message (Callable) – Messaging printing method for above log but below warning level output, i.e. infos that should generally be shown to users.

  • **kwargs – Setup parameters, i.e. CLI parameters of the train() command.

  • load_hyper_parameters (bool) –

  • output (str) –

  • spec (str) –

  • load (Optional[str]) –

  • device (str) –

  • training_data (Sequence[Dict]) –

  • evaluation_data (Sequence[Dict]) –

  • threads (int) –

  • force_binarization (bool) –

  • format_type (str) –

  • suppress_regions (bool) –

  • suppress_baselines (bool) –

  • valid_regions (Optional[Sequence[str]]) –

  • valid_baselines (Optional[Sequence[str]]) –

  • merge_regions (Optional[Dict[str, str]]) –

  • merge_baselines (Optional[Dict[str, str]]) –

  • bounding_regions (Optional[Sequence[str]]) –

  • resize (str) –

  • augment (bool) –

  • topline (Union[bool, None]) –

Returns

A KrakenTrainer object.

class kraken.lib.train.NoStopping

Never stops training.

trigger(self)

Function that raises a KrakenStopTrainingException after if the abort condition is fulfilled.

Return type

bool

update(self, val_loss)

Only update internal best iteration

Parameters

val_loss (torch.float) –

Return type

None

class kraken.lib.train.PolygonGTDataset(normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), preload=True, augmentation=False)

Dataset for training a line recognition model from polygonal/baseline data.

Parameters
  • normalization (Optional[str]) –

  • whitespace_normalization (bool) –

  • reorder (Union[bool, str]) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • preload (bool) –

  • augmentation (bool) –

add(self, *args, **kwargs)

Adds a line to the dataset.

Parameters
  • im (path) – Path to the whole page image

  • text (str) – Transcription of the line.

  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • boundary (list) – A polygon mask for the line.

encode(self, codec=None)

Adds a codec to the dataset and encodes all text lines.

Has to be run before sampling from the dataset.

Parameters

codec (Optional[kraken.lib.codec.PytorchCodec]) –

Return type

None

no_encode(self)

Creates an unencoded dataset.

Return type

None

parse(self, image, text, baseline, boundary, *args, **kwargs)

Parses a sample for the dataset and returns it.

This function is mainly uses for parallelized loading of training data.

Parameters
  • im (path) – Path to the whole page image

  • text (str) – Transcription of the line.

  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • boundary (list) – A polygon mask for the line.

  • image (Union[str, PIL.Image.Image]) –

class kraken.lib.train.PytorchCodec(charset, strict=False)

Translates between labels and graphemes.

Parameters

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) –

add_labels(self, charset)

Adds additional characters/labels to the codec.

charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.

As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.

Parameters

charset (unicode, list, dict) – Input character set.

Return type

PytorchCodec

decode(self, labels)

Decodes a labelling.

Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.

Parameters

labels (list) – Input containing tuples (label, start, end, confidence).

Returns

A list of tuples (code point, start, end, confidence)

Return type

list

encode(self, s)

Encodes a string into a sequence of labels.

If the code is non-singular we greedily encode the longest sequence first.

Parameters

s (str) – Input unicode string

Returns

(torch.IntTensor) encoded label sequence

Return type

torch.IntTensor

Raises:

property is_valid(self)

Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).

Return type

bool

property max_label(self)

Returns the maximum label value.

Return type

int

merge(self, codec)

Transforms this codec (c1) into another (c2) reusing as many labels as possible.

The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Returns

A merged codec and a list of labels that were removed from the original codec.

Return type

Tuple[PytorchCodec, Set]

class kraken.lib.train.TrainScheduler(optimizer)

Implements learning rate scheduling.

Parameters

optimizer (torch.optim.Optimizer) –

add_phase(self, steps, annealing_fn=None)

Adds a new phase to the scheduler.

Parameters
  • steps (int) – Number of step for this scheduler. Can be epochs or iteration depending on the scheduler.

  • max_lr (float) – Peak learning rate

  • annealing_fn (Callable) – LR change function.

Return type

None

batch_step(self, loss=None)

Performs an optimization step.

Parameters

loss (torch.float) –

Return type

None

epoch_step(self, val_loss=None)

Performs an optimization step.

Parameters

val_loss (torch.float) –

Return type

None

class kraken.lib.train.TrainStopper
abstract trigger(self)

Function that raises a KrakenStopTrainingException after if the abort condition is fulfilled.

Return type

bool

abstract update(self, val_loss)

Updates the internal state of the train stopper.

Parameters

val_loss (torch.float) –

Return type

None

class kraken.lib.train.annealing_const(*args, **kwargs)
property call_frequency(self)
class kraken.lib.train.annealing_cosine(optimizer, t_max=50, eta_min=1e-07)
property call_frequency(self)
class kraken.lib.train.annealing_exponential(optimizer, step_size, gamma=0.1)
property call_frequency(self)
class kraken.lib.train.annealing_onecycle(optimizer, max_lr=0.001, epochs=50, steps_per_epoch=None)
property call_frequency(self)
class kraken.lib.train.annealing_reduceonplateau(optimizer, patience=5, factor=0.1, mode='max', min_lr=1e-07)
property call_frequency(self)
class kraken.lib.train.annealing_step(optimizer, step_size, gamma=0.1)
property call_frequency(self)
kraken.lib.train.baseline_label_evaluator_fn(model, val_loader, device)
kraken.lib.train.baseline_label_loss_fn(criterion, output, target)
kraken.lib.train.collate_sequences(batch)

Sorts and pads sequences.

kraken.lib.train.compute_error(model, validation_set)

Computes error report from a model and a list of line image-text pairs.

Parameters
Returns

A tuple with total number of characters and edit distance across the whole validation set.

Return type

Tuple[int, int]

kraken.lib.train.generate_input_transforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)

Generates a torchvision transformation converting a PIL.Image into a tensor usable in a network forward pass.

Parameters
  • batch (int) – mini-batch size

  • height (int) – height of input image in pixels

  • width (int) – width of input image in pixels

  • channels (int) – color channels of input

  • pad (int) – Amount of padding on horizontal ends of image

  • valid_norm (bool) – Enables/disables baseline normalization as a valid preprocessing step. If disabled we will fall back to standard scaling.

  • force_binarization (bool) – Forces binarization of input images using the nlbin algorithm.

Returns

A torchvision transformation composition converting the input image to the appropriate tensor.

Return type

torchvision.transforms.Compose

kraken.lib.train.logger
kraken.lib.train.make_printable(char)

Takes a Unicode code point and return a printable representation of it.

Parameters

char (str) – Input code point

Returns

Either the original code point, the name of the code point if it is a combining mark, whitespace etc., or the hex code if it is a control symbol.

Return type

str

kraken.lib.train.preparse_xml_data(filenames, format_type='xml', repolygonize=False)

Loads training data from a set of xml files.

Extracts line information from Page/ALTO xml files for training of recognition models.

Parameters
  • filenames (list) – List of XML files.

  • format_type (str) – Either page, alto or xml for autodetermination.

  • repolygonize (bool) – (Re-)calculates polygon information using the kraken algorithm.

Returns

text, ‘baseline’: [[x0, y0], …], ‘boundary’: [[x0, y0], …], ‘image’: PIL.Image}.

Return type

A list of dicts {‘text’

kraken.lib.train.recognition_evaluator_fn(model, val_loader, device)
kraken.lib.train.recognition_loss_fn(criterion, output, target)
kraken.lib.train.validate_hyper_parameters(hyper_params)

Validate some model’s hyper parameters and modify them in place if need be.

kraken.binarization module

kraken.binarization

An adaptive binarization algorithm.

exception kraken.binarization.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

kraken.binarization.array2pil(a)
Parameters

a (numpy.ndarray) –

Return type

PIL.Image.Image

kraken.binarization.get_im_str(im)
Parameters

im (PIL.Image.Image) –

Return type

str

kraken.binarization.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters

im (PIL.Image.Image) – Image to test

Returns

True if the image contains only two different color values. False otherwise.

Return type

bool

kraken.binarization.logger
kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)

Performs binarization using non-linear processing.

Parameters
  • im (PIL.Image.Image) –

  • threshold (float) –

  • zoom (float) – Zoom for background page estimation

  • escale (float) – Scale for estimating a mask over the text region

  • border (float) – Ignore this much of the border

  • perc (int) – Percentage for filters

  • range (int) – Range for filters

  • low (int) – Percentile for black estimation

  • high (int) – Percentile for white estimation

Returns

PIL.Image.Image containing the binarized image

Raises

KrakenInputException when trying to binarize an empty image.

Return type

PIL.Image.Image

kraken.binarization.pil2array(im, alpha=0)
Parameters
  • im (PIL.Image.Image) –

  • alpha (int) –

Return type

numpy.ndarray

kraken.transcribe module

Utility functions for ground truth transcription.

exception kraken.transcribe.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)
add_page(self, im, segmentation=None, records=None)

Adds an image to the transcription interface, optionally filling in information from a list of ocr_record objects.

Parameters
  • im (PIL.Image) – Input image

  • segmentation (dict) – Output of the segment method.

  • records (list) – A list of ocr_record objects.

write(self, fd)

Writes the HTML file to a file descriptor.

Parameters

fd (File) – File descriptor (mode=’rb’) to write to.

kraken.transcribe.get_im_str(im)
Parameters

im (PIL.Image.Image) –

Return type

str

kraken.transcribe.logger

kraken.linegen module

linegen

An advanced line generation tool using Pango for proper text shaping. The actual drawing code was adapted from the create_image utility from nototools available at [0].

Line degradation uses a local model described in [1].

[0] https://github.com/googlei18n/nototools [1] Kanungo, Tapas, et al. “A statistical, nonparametric methodology for document degradation model validation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 22.11 (2000): 1209-1223.

class kraken.linegen.CairoContext

Structure base class

class kraken.linegen.CairoSurface

Structure base class

exception kraken.linegen.KrakenCairoSurfaceException(message, width, height)

Raised when the Cairo surface couldn’t be created.

Parameters
  • message (str) –

  • width (int) –

  • height (int) –

message

Error message

Type

str

width

Width of the surface

Type

int

height

Height of the surface

Type

int

class kraken.linegen.LineGenerator(family='Sans', font_size=32, font_weight=400, language=None)

Produces degraded line images using a single collection of font families.

render_line(self, text)

Draws a line onto a Cairo surface which will be converted to an pillow Image.

Parameters

text (unicode) – A string which will be rendered as a single line.

Returns

PIL.Image of mode ‘L’.

Raises
  • KrakenCairoSurfaceException if the Cairo surface couldn't be created

  • (usually caused by invalid dimensions.

class kraken.linegen.PangoContext

Structure base class

class kraken.linegen.PangoFontDescription

Structure base class

class kraken.linegen.PangoLanguage

Structure base class

class kraken.linegen.PangoLayout

Structure base class

class kraken.linegen.PangoRectangle

Structure base class

kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.argtypes
kraken.linegen.array2pil(a)
Parameters

a (numpy.ndarray) –

Return type

PIL.Image.Image

kraken.linegen.c_lib
kraken.linegen.cairo
kraken.linegen.degrade_line(im, eta=0.0, alpha=1.5, beta=1.5, alpha_0=1.0, beta_0=1.0)

Degrades a line image by adding noise.

For parameter meanings consult [1].

Parameters
  • im (PIL.Image) – Input image

  • eta (float) –

  • alpha (float) –

  • beta (float) –

  • alpha_0 (float) –

  • beta_0 (float) –

Returns

PIL.Image in mode ‘1’

kraken.linegen.distort_line(im, distort=3.0, sigma=10, eps=0.03, delta=0.3)

Distorts a line image.

Run BEFORE degrade_line as a white border of 5 pixels will be added.

Parameters
  • im (PIL.Image) – Input image

  • distort (float) –

  • sigma (float) –

  • eps (float) –

  • delta (float) –

Returns

PIL.Image in mode ‘L’

class kraken.linegen.ensureBytes

Simple class ensuring the arguments of type char * are actually a series of bytes.

classmethod from_param(cls, value)
Parameters

value (AnyStr) –

Return type

bytes

kraken.linegen.logger
kraken.linegen.ocropy_degrade(im, distort=1.0, dsigma=20.0, eps=0.03, delta=0.3, degradations=((0.5, 0.0, 0.5, 0.0),))

Degrades and distorts a line using the same noise model used by ocropus.

Parameters
  • im (PIL.Image) – Input image

  • distort (float) –

  • dsigma (float) –

  • eps (float) –

  • delta (float) –

  • degradations (list) – list returning 4-tuples corresponding to the degradations argument of ocropus-linegen.

Returns

PIL.Image in mode ‘L’

kraken.linegen.p_lib
kraken.linegen.pango
kraken.linegen.pangocairo
kraken.linegen.pc_lib
kraken.linegen.pil2array(im, alpha=0)
Parameters
  • im (PIL.Image.Image) –

  • alpha (int) –

Return type

numpy.ndarray

kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype
kraken.linegen.restype

kraken.lib.dataset module

Utility functions for data loading and training of VGSL networks.

class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)

Dataset for training a baseline/region segmentation model.

Parameters
  • imgs (Sequence[str]) –

  • suffix (str) –

  • line_width (int) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • mode (str) –

  • augmentation (bool) –

  • valid_baselines (Sequence[str]) –

  • merge_baselines (Dict[str, Sequence[str]]) –

  • valid_regions (Sequence[str]) –

  • merge_regions (Dict[str, Sequence[str]]) –

add(self, image, baselines=None, regions=None, *args, **kwargs)

Adds a page to the dataset.

Parameters
  • im (path) – Path to the whole page image

  • baseline (dict) – A list containing dicts with a list of coordinates and script types [{‘baseline’: [[x0, y0], …, [xn, yn]], ‘script’: ‘script_type’}, …]

  • regions (dict) – A dict containing list of lists of coordinates {‘region_type_0’: [[x0, y0], …, [xn, yn]]], ‘region_type_1’: …}.

  • image (Union[str, PIL.Image.Image]) –

  • baselines (List[List[List[Tuple[int, int]]]]) –

transform(self, image, target)
class kraken.lib.dataset.CenterNormalizer(target_height=48, params=(4, 1.0, 0.3))
dewarp(self, img, cval=0, dtype=np.dtype('f'))
measure(self, line)
normalize(self, img, order=1, dtype=np.dtype('f'), cval=0)
setHeight(self, target_height)
class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), preload=True, augmentation=False)

Dataset for training a line recognition model.

All data is cached in memory.

Parameters
  • split (Callable[[str], str]) –

  • suffix (str) –

  • normalization (Optional[str]) –

  • whitespace_normalization (bool) –

  • reorder (Union[bool, str]) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • preload (bool) –

  • augmentation (bool) –

add(self, *args, **kwargs)

Adds a line-image-text pair to the dataset.

Parameters

image (str) – Input image path

Return type

None

add_loaded(self, image, gt)

Adds an already loaded line-image-text pair to the dataset.

Parameters
  • image (PIL.Image.Image) – Line image

  • gt (str) – Text contained in the line image

Return type

None

encode(self, codec=None)

Adds a codec to the dataset and encodes all text lines.

Has to be run before sampling from the dataset.

Parameters

codec (Optional[kraken.lib.codec.PytorchCodec]) –

Return type

None

no_encode(self)

Creates an unencoded dataset.

Return type

None

parse(self, image, *args, **kwargs)

Parses a sample for this dataset.

This is mostly used to parallelize populating the dataset.

Parameters

image (str) – Input image path

Return type

Dict

class kraken.lib.dataset.InfiniteDataLoader(*args, **kwargs)

Version of DataLoader that auto-reinitializes the iterator once it is exhausted.

exception kraken.lib.dataset.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), preload=True, augmentation=False)

Dataset for training a line recognition model from polygonal/baseline data.

Parameters
  • normalization (Optional[str]) –

  • whitespace_normalization (bool) –

  • reorder (Union[bool, str]) –

  • im_transforms (Callable[[Any], torch.Tensor]) –

  • preload (bool) –

  • augmentation (bool) –

add(self, *args, **kwargs)

Adds a line to the dataset.

Parameters
  • im (path) – Path to the whole page image

  • text (str) – Transcription of the line.

  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • boundary (list) – A polygon mask for the line.

encode(self, codec=None)

Adds a codec to the dataset and encodes all text lines.

Has to be run before sampling from the dataset.

Parameters

codec (Optional[kraken.lib.codec.PytorchCodec]) –

Return type

None

no_encode(self)

Creates an unencoded dataset.

Return type

None

parse(self, image, text, baseline, boundary, *args, **kwargs)

Parses a sample for the dataset and returns it.

This function is mainly uses for parallelized loading of training data.

Parameters
  • im (path) – Path to the whole page image

  • text (str) – Transcription of the line.

  • baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].

  • boundary (list) – A polygon mask for the line.

  • image (Union[str, PIL.Image.Image]) –

class kraken.lib.dataset.PytorchCodec(charset, strict=False)

Translates between labels and graphemes.

Parameters

charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) –

add_labels(self, charset)

Adds additional characters/labels to the codec.

charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.

As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.

Parameters

charset (unicode, list, dict) – Input character set.

Return type

PytorchCodec

decode(self, labels)

Decodes a labelling.

Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.

Parameters

labels (list) – Input containing tuples (label, start, end, confidence).

Returns

A list of tuples (code point, start, end, confidence)

Return type

list

encode(self, s)

Encodes a string into a sequence of labels.

If the code is non-singular we greedily encode the longest sequence first.

Parameters

s (str) – Input unicode string

Returns

(torch.IntTensor) encoded label sequence

Return type

torch.IntTensor

Raises:

property is_valid(self)

Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).

Return type

bool

property max_label(self)

Returns the maximum label value.

Return type

int

merge(self, codec)

Transforms this codec (c1) into another (c2) reusing as many labels as possible.

The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.

Parameters

codec (kraken.lib.codec.PytorchCodec) –

Returns

A merged codec and a list of labels that were removed from the original codec.

Return type

Tuple[PytorchCodec, Set]

class kraken.lib.dataset.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')

A class wrapping a TorchVGSLModel with a more comfortable recognition interface.

Parameters
  • train (bool) –

  • device (str) –

forward(self, line, lens=None)

Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

Tuple with (N, W, C) shaped numpy array and final output sequence lengths.

Return type

numpy.ndarray

predict(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).

Parameters
  • line (torch.Tensor) – NCHW line tensor

  • lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns

List of decoded sequences.

Return type

List[List[Tuple[str, int, int, float]]]

predict_labels(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.

Parameters
  • line (torch.tensor) –

  • lens (torch.Tensor) –

Return type

List[List[Tuple[int, int, int, float]]]

predict_string(self, line, lens=None)

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.

Parameters
  • line (torch.Tensor) –

  • lens (torch.Tensor) –

Return type

List[str]

to(self, device)

Moves model to device and automatically loads input tensors onto it.

kraken.lib.dataset.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)

Given a list of baselines and an input image, calculates a polygonal environment around each baseline.

Parameters
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • baselines (sequence) – List of lists containing a single baseline per entry.

  • suppl_obj (sequence) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.

  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).

  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.

  • topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards. If set to None, no offset will be applied.

Returns

List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.

kraken.lib.dataset.collate_sequences(batch)

Sorts and pads sequences.

kraken.lib.dataset.compute_confusions(algn1, algn2)

Compute confusion matrices from two globally aligned strings.

Parameters
  • align1 (Sequence[str]) – sequence 1

  • align2 (Sequence[str]) – sequence 2

  • algn1 (Sequence[str]) –

  • algn2 (Sequence[str]) –

Returns

A tuple (counts, scripts, ins, dels, subs) with counts being per-character confusions, scripts per-script counts, ins a dict with per script insertions, del an integer of the number of deletions, subs per script substitutions.

kraken.lib.dataset.compute_error(model, validation_set)

Computes error report from a model and a list of line image-text pairs.

Parameters
Returns

A tuple with total number of characters and edit distance across the whole validation set.

Return type

Tuple[int, int]

kraken.lib.dataset.extract_polygons(im, bounds)

Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.

Parameters
  • im (PIL.Image.Image) – Input image

  • bounds (list) – A list of tuples (x1, y1, x2, y2)

Yields

(PIL.Image.Image) the extracted subimage

Return type

PIL.Image.Image

kraken.lib.dataset.generate_input_transforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)

Generates a torchvision transformation converting a PIL.Image into a tensor usable in a network forward pass.

Parameters
  • batch (int) – mini-batch size

  • height (int) – height of input image in pixels

  • width (int) – width of input image in pixels

  • channels (int) – color channels of input

  • pad (int) – Amount of padding on horizontal ends of image

  • valid_norm (bool) – Enables/disables baseline normalization as a valid preprocessing step. If disabled we will fall back to standard scaling.

  • force_binarization (bool) – Forces binarization of input images using the nlbin algorithm.

Returns

A torchvision transformation composition converting the input image to the appropriate tensor.

Return type

torchvision.transforms.Compose

kraken.lib.dataset.global_align(seq1, seq2)

Computes a global alignment of two strings.

Parameters
  • seq1 (Sequence[Any]) –

  • seq2 (Sequence[Any]) –

Return type

Tuple[int, List[str], List[str]]

Returns a tuple (distance, list(algn1), list(algn2))

kraken.lib.dataset.is_bitonal(im)

Tests a PIL.Image for bitonality.

Parameters

im (PIL.Image.Image) – Image to test

Returns

True if the image contains only two different color values. False otherwise.

Return type

bool

kraken.lib.dataset.logger
kraken.lib.dataset.parse_alto(filename)

Parses an ALTO file, returns the baselines defined in it, and loads the referenced image.

Parameters

filename (str) – path to an ALTO file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}

Return type

A dict {‘image’

kraken.lib.dataset.parse_page(filename)

Parses a PageXML file, returns the baselines defined in it, and loads the referenced image.

Parameters

filename (str) – path to a PageXML file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}}

Return type

A dict {‘image’

kraken.lib.dataset.parse_xml(filename)

Parses either a PageXML or ALTO file with autodetermination of the file format.

Parameters

filename (str) – path to an XML file.

Returns

impath, lines: [{‘boundary’: [[x0, y0], …], ‘baseline’: [[x0, y0], …]}, {…], ‘text’: ‘apdjfqpf’, ‘script’: ‘script_type’}, regions: {‘region_type_0’: [[[x0, y0], …], …], …}, ‘base_dir’: None}

Return type

A dict {‘image’

kraken.lib.dataset.preparse_xml_data(filenames, format_type='xml', repolygonize=False)

Loads training data from a set of xml files.

Extracts line information from Page/ALTO xml files for training of recognition models.

Parameters
  • filenames (list) – List of XML files.

  • format_type (str) – Either page, alto or xml for autodetermination.

  • repolygonize (bool) – (Re-)calculates polygon information using the kraken algorithm.

Returns

text, ‘baseline’: [[x0, y0], …], ‘boundary’: [[x0, y0], …], ‘image’: PIL.Image}.

Return type

A list of dicts {‘text’

kraken.lib.segmentation module

Processing for baseline segmenter output

exception kraken.lib.segmentation.KrakenInputException(message=None)

Common base class for all non-exit exceptions.

class kraken.lib.segmentation.LineMCP(*args, **kwargs)
create_connection(self, id1, id2, pos1, pos2, cost1, cost2)
get_connections(self)
goal_reached(self, int_index, float_cumcost)
kraken.lib.segmentation.boundary_tracing(region)

Find coordinates of the region’s boundary. The region must not have isolated points.

Code copied from https://github.com/machine-shop/deepwings/blob/master/deepwings/method_features_extraction/image_processing.py#L185

Parameters

region – object obtained with skimage.measure.regionprops().

Returns

List of coordinates of pixels in the boundary.

kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)

Given a list of baselines and an input image, calculates a polygonal environment around each baseline.

Parameters
  • im (PIL.Image) – grayscale input image (mode ‘L’)

  • baselines (sequence) – List of lists containing a single baseline per entry.

  • suppl_obj (sequence) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.

  • im_feats (numpy.array) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).

  • scale (tuple) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.

  • topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards. If set to None, no offset will be applied.

Returns

List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.

kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)

Given a baseline, polygonal boundary, and two points on the baseline return the rectangle formed by the orthogonal cuts on that baseline segment. The resulting polygon is not garantueed to have a non-zero area.

The distance can be larger than the actual length of the baseline if the baseline endpoints are inside the bounding polygon. In that case the baseline will be extrapolated to the polygon edge.

Parameters
  • baseline (list) – A polyline ((x1, y1), …, (xn, yn))

  • boundary (list) – A bounding polygon around the baseline (same format as baseline).

  • dist1 (int) – Absolute distance along the baseline of the first point.

  • dist2 (int) – Absolute distance along the baseline of the second point.

Returns

A sequence of polygon points.

kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)
kraken.lib.segmentation.extract_polygons(im, bounds)

Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.

Parameters
  • im (PIL.Image.Image) – Input image

  • bounds (list) – A list of tuples (x1, y1, x2, y2)

Yields

(PIL.Image.Image) the extracted subimage

Return type

PIL.Image.Image

kraken.lib.segmentation.is_in_region(line, region)

Tests if a line is inside a region, i.e. if the mid point of the baseline is inside the region.

Parameters
  • line (geom.LineString) – line to test

  • region (geom.Polygon) –

Returns

False if line is not inside region, True otherwise

Return type

bool

kraken.lib.segmentation.line_regions(line, regions)

Filters a list of regions by line association.

Parameters
  • line (list) – Polyline representing the line.

  • regions (list) – List of region polygons

Returns

A list of regions that contain the line mid-point.

kraken.lib.segmentation.logger
kraken.lib.segmentation.moore_neighborhood(current, backtrack)
kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)

Given a list of baselines and regions, calculates the correct reading order and applies it to the input.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and its polygonization.

  • regions (Sequence) – List of region polygons.

  • text_direction (str) – Set principal text direction for column ordering. Can be ‘lr’ or ‘rl’

Returns

A reordered input.

Return type

Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]

kraken.lib.segmentation.reading_order(lines, text_direction='lr')

Given the list of lines (a list of 2D slices), computes the partial reading order. The output is a binary 2D array such that order[i,j] is true if line i comes before line j in reading order.

Parameters
  • lines (Sequence[Tuple[slice, slice]]) –

  • text_direction (str) –

Return type

numpy.ndarray

kraken.lib.segmentation.scale_polygonal_lines(lines, scale)

Scales baselines/polygon coordinates by a certain factor.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and it’s polygonization.

  • scale (float or tuple of floats) – Scaling factor

Return type

Sequence[Tuple[List, List]]

kraken.lib.segmentation.scale_regions(regions, scale)

Scales baselines/polygon coordinates by a certain factor.

Parameters
  • lines (Sequence) – List of tuples containing the baseline and it’s polygonization.

  • scale (float or tuple of floats) – Scaling factor

  • regions (Sequence[Tuple[List[int], List[int]]]) –

Return type

Sequence[Tuple[List, List]]

kraken.lib.segmentation.topsort(order)

Given a binary array defining a partial order (o[i,j]==True means i<j), compute a topological sort. This is a quick and dirty implementation that works for up to a few thousand elements.

Parameters

order (numpy.ndarray) –

Return type

List[int]

kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)

Vectorizes lines from a binarized array.

Parameters
  • im (np.ndarray) – Array of shape (3, H, W) with the first dimension being probabilities for (start_separators, end_separators, baseline).

  • threshold (float) – Threshold for baseline blob detection.

  • min_length (int) – Minimal length of output baselines.

Returns

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the points of all baseline polylines.

kraken.lib.segmentation.vectorize_regions(im, threshold=0.5)

Vectorizes lines from a binarized array.

Parameters
  • im (np.ndarray) – Array of shape (H, W) with the first dimension being a probability distribution over the region.

  • threshold (float) – Threshold for binarization

Returns

[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the region polygons.

kraken.lib.ctc_decoder

Decoders for softmax outputs of CTC trained networks.

kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)

Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].

[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).

Parameters
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • outputs (numpy.ndarray) –

  • beam_size (int) –

Returns

A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.

Return type

List[Tuple[int, int, int, float]]

kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)

Translates back the network output to a label sequence as the original ocropy/clstm.

Thresholds on class 0, then assigns the maximum (non-zero) class to each region.

Parameters
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • threshold (float) – Threshold for 0 class when determining possible label locations.

  • outputs (numpy.ndarray) –

Returns

A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.

Return type

List[Tuple[int, int, int, float]]

kraken.lib.ctc_decoder.greedy_decoder(outputs)

Translates back the network output to a label sequence using greedy/best path decoding as described in [0].

[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.

Parameters
  • output (numpy.array) – (C, W) shaped softmax output tensor

  • outputs (numpy.ndarray) –

Returns

A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.

Return type

List[Tuple[int, int, int, float]]