API Reference¶

kraken.blla module¶

Note

blla provides the interface to the fully trainable segmenter. For the legacy segmenter interface refer to the pageseg module. Note that recognition models are not interchangeable between segmenters.

kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')¶

Segments a page into text lines using the baseline segmenter.

Segments a page into text lines and returns the polyline formed by each baseline and their estimated environment.

Parameters:

im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible to supply a binarized-input-only model which requires accordingly treated images.
text_direction (str) – Passed-through value for serialization.serialize.
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to determine the reading order. Has to accept a list of tuples (baselines, polygon) and a text direction (lr or rl).
model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If none is given a default model will be loaded.
device (str) – The target device to run the neural network on.

Returns:

A dictionary containing the text direction and under the key ‘lines’ a list of reading order sorted baselines (polylines) and their respective polygonal boundaries. The last and first point of each boundary polygon are connected.

 {'text_direction': '$dir',
  'type': 'baseline',
  'lines': [
     {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]},
     {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}
   ]
   'regions': [
     {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'},
     {'region': [[x0, ...]], 'type': 'text'}
   ]
 }

Raises:

KrakenInvalidModelException – if the given model is not a valid segmentation model.
KrakenInputException – if the mask is not bitonal or does not match the image size.

Return type:

Dict[str, Any]

kraken.pageseg module¶

Note

pageseg is the legacy bounding box-based segmenter. For the trainable baseline segmenter interface refer to the blla module. Note that recognition models are not interchangeable between segmenters.

kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)¶

Segments a page into text lines.

Segments a page into text lines and returns the absolute coordinates of each line in reading order.

Parameters:

im – A bi-level page of mode ‘1’ or ‘L’
text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.
maxcolseps (float) – Maximum number of whitespace column separators
black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
no_hlines (bool) – Switch for small horizontal line removal.
pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is used both left and right. If a 2-tuple, uses (padding_left, padding_right).
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to call to order line output. Callable accepting a list of slices (y, x) and a text direction in (rl, lr).

Returns:

A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’:

{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}

Raises:

KrakenInputException – if the input image is not binarized or the text direction is invalid.

Return type:

Dict[str, Any]

kraken.rpred module¶

kraken.rpred.bidi_record(record, base_dir=None)¶

Reorders a record using the Unicode BiDi algorithm.

Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.

Parameters:: record (kraken.rpred.ocr_record)
Returns:: kraken.rpred.ocr_record
Return type:: ocr_record

class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)¶

Multi-model version of kraken.rpred.rpred

Parameters:

nets (Dict[str, kraken.lib.models.TorchSeqRecognizer])
im (PIL.Image.Image)
bounds (dict)
pad (int)
bidi_reordering (Union[bool, str])
tags_ignore (Optional[List[str]])

bidi_reordering = True¶

bounds¶

im¶

nets¶

pad = 16¶

tags_ignore = None¶

ts¶

class kraken.rpred.ocr_record(prediction, cuts, confidences, line)¶

A record object containing the recognition result of a single line

Parameters:

prediction (str)
confidences (List[float])
line (Union[List, Dict[str, List]])

base_dir = None¶

confidences¶

cuts¶

prediction¶

tags = None¶

type = 'baselines'¶

kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)¶

Uses a TorchSeqRecognizer and a segmentation to recognize text

Parameters:

network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object
im (PIL.Image.Image) – Image to extract text from
bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.
bidi_reordering (bool|str) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display. Set to L|R to change base text direction.

Yields:

An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.

Return type:

Generator[ocr_record, None, None]

kraken.serialization module¶

kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)¶

Renders an accuracy report.

Parameters:

model (str) – Model name.
errors (int) – Number of errors on test set.
char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a number of occurrences.
scripts (dict) – Dictionary counting character per script.
insertions (dict) – Dictionary counting insertion operations per Unicode script
deletions (int) – Number of deletions
substitutions (dict) – Dictionary counting substitution operations per Unicode script.
chars (int)

Returns:

A string containing the rendered report.

Return type:

str

kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr', processing_steps=None)¶

Serializes a list of ocr_records into an output document.

Serializes a list of predictions and their corresponding positions by doing some hOCR-specific preprocessing and then renders them through one of several jinja2 templates.

Note: Empty records are ignored for serialization purposes.

Parameters:

records (Sequence[kraken.rpred.ocr_record]) – List of kraken.rpred.ocr_record
image_name (str) – Name of the source image
image_size (Tuple[int, int]) – Dimensions of the source image
writing_mode (str) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.
scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records
regions (Optional[Dict[str, List[List[Tuple[int, int]]]]]) – Dictionary mapping region types to a list of region polygons.
template (str) – Selector for the serialization format. May be ‘hocr’, ‘alto’, ‘page’ or any template found in the template directory.
processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]]) –
A list of dictionaries describing the processing kraken performed on the inputs:
```
{'category': 'preprocessing',
 'description': 'natural language description of process',
 'settings': {'arg0': 'foo', 'argX': 'bar'}
}
```

Returns:

The rendered template

Return type:

str

kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='hocr', processing_steps=None)¶

Serializes a segmentation result into an output document.

Parameters:

segresult (Dict[str, Any]) – Result of blla.segment
image_name (str) – Name of the source image
image_size (tuple) – Dimensions of the source image
template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.
processing_steps (Optional[List[Dict[str, Union[Dict, str, float, int, bool]]]])

Returns:

(str) rendered template.

Return type:

str

kraken.lib.models module¶

class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')¶

A wrapper class around a TorchVGSLModel for text recognition.

Parameters:

nn (kraken.lib.vgsl.TorchVGSLModel)
train (bool)
device (str)

codec¶

decoder¶

device = 'cpu'¶

forward(line, lens=None)¶

Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).

Parameters:

line (torch.Tensor) – NCHW line tensor
lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1

Returns:

Tuple with (N, W, C) shaped numpy array and final output sequence lengths.

Raises:

KrakenInputException – Is raised if the channel dimension isn’t of size 1 in the network output.

Return type:

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

kind = ''¶

nn¶

one_channel_mode¶

predict(line, lens=None)¶

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).

Parameters:

line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1

Returns:

List of decoded sequences.

Return type:

List[List[Tuple[str, int, int, float]]]

predict_labels(line, lens=None)¶

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.

Parameters:

line (torch.tensor)
lens (torch.Tensor)

Return type:

List[List[Tuple[int, int, int, float]]]

predict_string(line, lens=None)¶

Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.

Parameters:

line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor

Return type:

List[str]

seg_type¶

to(device)¶: Moves model to device and automatically loads input tensors onto it.

train = False¶

kraken.lib.models.load_any(fname, train=False, device='cpu')¶

Loads anything that was, is, and will be a valid ocropus model and instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN configuration in the file.

Currently it recognizes the following kinds of models:

protobuf models containing VGSL segmentation and recognition networks.

Additionally an attribute ‘kind’ will be added to the SeqRecognizer containing a string representation of the source kind. Current known values are:

vgsl for VGSL models

Parameters:

fname (str) – Path to the model
train (bool) – Enables gradient calculation and dropout layers in model.
device (str) – Target device

Returns:

A kraken.lib.models.TorchSeqRecognizer object.

Raises:

KrakenInvalidModelException – if the model is not loadable by any parser.

Return type:

TorchSeqRecognizer

kraken.lib.vgsl module¶

class kraken.lib.vgsl.TorchVGSLModel(spec)¶

Class building a torch module from a VSGL spec.

The initialized class will contain a variable number of layers and a loss function. Inputs and outputs are always 4D tensors in order (batch, channels, height, width) with channels always being the feature dimension.

Importantly this means that a recurrent network will be fed the channel vector at each step along its time axis, i.e. either put the non-time-axis dimension into the channels dimension or use a summarizing RNN squashing the time axis to 1 and putting the output into the channels dimension respectively.

Parameters:: spec (str)

input¶: Expected input tensor as a 4-tuple.

nn¶: Stack of layers parsed from the spec.

criterion¶: Fully parametrized loss function.

user_metadata¶: dict with user defined metadata. Is flushed into model file during saving/overwritten by loading operations.

one_channel_mode¶: Field indicating the image type used during training of one-channel images. Is ‘1’ for models trained on binarized images, ‘L’ for grayscale, and None otherwise.

add_codec(codec)¶

Adds a PytorchCodec to the model.

Parameters:: codec (kraken.lib.codec.PytorchCodec)
Return type:: None

append(idx, spec)¶

Splits a model at layer idx and append layers spec.

New layers are initialized using the init_weights method.

Parameters:

idx (int) – Index of layer to append spec to starting with 1. To select the whole layer stack set idx to None.
spec (str) – VGSL spec without input block to append to model.

Return type:

None

property aux_layers¶

build_addition(input, blocks, idx)¶

Parameters:

input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)

Return type: