API Reference¶
kraken.blla module¶
Note
blla provides the interface to the fully trainable segmenter. For the legacy segmenter interface refer to the pageseg module. Note that recognition models are not interchangeable between segmenters.
- kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu')¶
Segments a page into text lines using the baseline segmenter.
Segments a page into text lines and returns the polyline formed by each baseline and their estimated environment.
- Parameters
im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible to supply a binarized-input-only model which requires accordingly treated images.
text_direction (str) – Passed-through value for serialization.serialize.
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to determine the reading order. Has to accept a list of tuples (baselines, polygon) and a text direction (lr or rl).
model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If none is given a default model will be loaded.
device (str) – The target device to run the neural network on.
- Returns
A dictionary containing the text direction and under the key ‘lines’ a list of reading order sorted baselines (polylines) and their respective polygonal boundaries. The last and first point of each boundary polygon are connected.
{'text_direction': '$dir', 'type': 'baseline', 'lines': [ {'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]}, {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]} ] 'regions': [ {'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'}, {'region': [[x0, ...]], 'type': 'text'} ] }
- Raises
KrakenInvalidModelException – if the given model is not a valid segmentation model.
KrakenInputException – if the mask is not bitonal or does not match the image size.
- Return type
Dict[str, Any]
kraken.pageseg module¶
Note
pageseg is the legacy bounding box-based segmenter. For the trainable baseline segmenter interface refer to the blla module. Note that recognition models are not interchangeable between segmenters.
- kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)¶
Segments a page into text lines.
Segments a page into text lines and returns the absolute coordinates of each line in reading order.
- Parameters
im – A bi-level page of mode ‘1’ or ‘L’
text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.
maxcolseps (float) – Maximum number of whitespace column separators
black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
no_hlines (bool) – Switch for small horizontal line removal.
pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is used both left and right. If a 2-tuple, uses (padding_left, padding_right).
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to call to order line output. Callable accepting a list of slices (y, x) and a text direction in (rl, lr).
- Returns
A dictionary containing the text direction and a list of reading order sorted bounding boxes under the key ‘boxes’:
{'text_direction': '$dir', 'boxes': [(x1, y1, x2, y2),...]}
- Raises
KrakenInputException – if the input image is not binarized or the text direction is invalid.
- Return type
Dict[str, Any]
kraken.rpred module¶
- kraken.rpred.bidi_record(record, base_dir=None)¶
Reorders a record using the Unicode BiDi algorithm.
Models trained for RTL or mixed scripts still emit classes in LTR order requiring reordering for proper display.
- Parameters
record (kraken.rpred.ocr_record) –
- Returns
kraken.rpred.ocr_record
- Return type
- class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None)¶
Multi-model version of kraken.rpred.rpred
- Parameters
nets (Dict[str, kraken.lib.models.TorchSeqRecognizer]) –
im (PIL.Image.Image) –
bounds (dict) –
pad (int) –
bidi_reordering (Union[bool, str]) –
tags_ignore (Optional[List[str]]) –
- class kraken.rpred.ocr_record(prediction, cuts, confidences, line)¶
A record object containing the recognition result of a single line
- Parameters
prediction (str) –
confidences (List[float]) –
line (Union[List, Dict[str, List]]) –
- kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True)¶
Uses a TorchSeqRecognizer and a segmentation to recognize text
- Parameters
network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object
im (PIL.Image.Image) – Image to extract text from
bounds (dict) – A dictionary containing a ‘boxes’ entry with a list of coordinates (x0, y0, x1, y1) of a text line in the image and an entry ‘text_direction’ containing ‘horizontal-lr/rl/vertical-lr/rl’.
pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.
bidi_reordering (bool|str) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display. Set to L|R to change base text direction.
- Yields
An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.
- Return type
Generator[ocr_record, None, None]
kraken.serialization module¶
- kraken.serialization.render_report(model, chars, errors, char_confusions, scripts, insertions, deletions, substitutions)¶
Renders an accuracy report.
- Parameters
model (str) – Model name.
errors (int) – Number of errors on test set.
char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a number of occurrences.
scripts (dict) – Dictionary counting character per script.
insertions (dict) – Dictionary counting insertion operations per Unicode script
deletions (int) – Number of deletions
substitutions (dict) – Dictionary counting substitution operations per Unicode script.
chars (int) –
- Returns
A string containing the rendered report.
- Return type
str
- kraken.serialization.serialize(records, image_name=None, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, regions=None, template='hocr')¶
Serializes a list of ocr_records into an output document.
Serializes a list of predictions and their corresponding positions by doing some hOCR-specific preprocessing and then renders them through one of several jinja2 templates.
Note: Empty records are ignored for serialization purposes.
- Parameters
records (iterable) – List of kraken.rpred.ocr_record
image_name (str) – Name of the source image
image_size (tuple) – Dimensions of the source image
writing_mode (str) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.
scripts (list) – List of scripts contained in the OCR records
regions (list) – Dictionary mapping region types to a list of region polygons.
template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.
- Returns
(str) rendered template.
- Return type
str
- kraken.serialization.serialize_segmentation(segresult, image_name=None, image_size=(0, 0), template='hocr')¶
Serializes a segmentation result into an output document.
- Parameters
segresult (Dict[str, Any]) – Result of blla.segment
image_name (str) – Name of the source image
image_size (tuple) – Dimensions of the source image
template (str) – Selector for the serialization format. May be ‘hocr’ or ‘alto’.
- Returns
(str) rendered template.
- Return type
str
kraken.lib.models module¶
- class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')¶
A wrapper class around a TorchVGSLModel for text recognition.
- Parameters
train (bool) –
device (str) –
- forward(self, line, lens=None)¶
Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).
- Parameters
line (torch.Tensor) – NCHW line tensor
lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1
- Returns
Tuple with (N, W, C) shaped numpy array and final output sequence lengths.
- Raises
KrakenInputException – Is raised if the channel dimension isn’t of size 1 in the network output.
- Return type
Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]
- predict(self, line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).
- Parameters
line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1
- Returns
List of decoded sequences.
- Return type
List[List[Tuple[str, int, int, float]]]
- predict_labels(self, line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.
- Parameters
line (torch.tensor) –
lens (torch.Tensor) –
- Return type
List[List[Tuple[int, int, int, float]]]
- predict_string(self, line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.
- Parameters
line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor
- Return type
List[str]
- to(self, device)¶
Moves model to device and automatically loads input tensors onto it.
- kraken.lib.models.load_any(fname, train=False, device='cpu')¶
Loads anything that was, is, and will be a valid ocropus model and instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN configuration in the file.
Currently it recognizes the following kinds of models:
protobuf models containing converted python BIDILSTMs (recognition only)
protobuf models containing CLSTM networks (recognition only)
protobuf models containing VGSL segmentation and recognitino networks.
Additionally an attribute ‘kind’ will be added to the SeqRecognizer containing a string representation of the source kind. Current known values are:
pyrnn for pickled BIDILSTMs
clstm for protobuf models generated by clstm
vgsl for VGSL models
- Parameters
fname (str) – Path to the model
train (bool) – Enables gradient calculation and dropout layers in model.
device (str) – Target device
- Returns
A kraken.lib.models.TorchSeqRecognizer object.
- Raises
KrakenInvalidModelException – if the model is not loadable by any parser.
- Return type
kraken.lib.vgsl module¶
- class kraken.lib.vgsl.TorchVGSLModel(spec)¶
Class building a torch module from a VSGL spec.
The initialized class will contain a variable number of layers and a loss function. Inputs and outputs are always 4D tensors in order (batch, channels, height, width) with channels always being the feature dimension.
Importantly this means that a recurrent network will be fed the channel vector at each step along its time axis, i.e. either put the non-time-axis dimension into the channels dimension or use a summarizing RNN squashing the time axis to 1 and putting the output into the channels dimension respectively.
- Parameters
spec (str) –
- input¶
Expected input tensor as a 4-tuple.
- Type
tuple
- nn¶
Stack of layers parsed from the spec.
- Type
torch.nn.Sequential
- criterion¶
Fully parametrized loss function.
- Type
torch.nn.Module
- user_metadata¶
dict with user defined metadata. Is flushed into model file during saving/overwritten by loading operations.
- Type
dict
- one_channel_mode¶
Field indicating the image type used during training of one-channel images. Is ‘1’ for models trained on binarized images, ‘L’ for grayscale, and None otherwise.
- Type
str
- add_codec(self, codec)¶
Adds a PytorchCodec to the model.
- Parameters
codec (kraken.lib.codec.PytorchCodec) –
- Return type
None
- append(self, idx, spec)¶
Splits a model at layer idx and append layers spec.
New layers are initialized using the init_weights method.
- Parameters
idx (int) – Index of layer to append spec to starting with 1. To select the whole layer stack set idx to None.
spec (str) – VGSL spec without input block to append to model.
- Return type
None
- build_addition(self, input, blocks, idx)¶
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_conv(self, input, blocks, idx)¶
Builds a 2D convolution layer.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_dropout(self, input, blocks, idx)¶
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_groupnorm(self, input, blocks, idx)¶
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_identity(self, input, blocks, idx)¶
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_maxpool(self, input, blocks, idx)¶
Builds a maxpool layer.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_output(self, input, blocks, idx)¶
Builds an output layer.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_parallel(self, input, blocks, idx)¶
Builds a block of parallel layers.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_reshape(self, input, blocks, idx)¶
Builds a reshape layer
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_rnn(self, input, blocks, idx)¶
Builds an LSTM/GRU layer returning number of outputs and layer.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_series(self, input, blocks, idx)¶
Builds a serial block of layers.
- Parameters
input (Tuple[int, int, int, int]) –
blocks (List[str]) –
idx (int) –
- Return type
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- eval(self)¶
Sets the model to evaluation/inference mode, disabling dropout and gradient calculation.
- Return type
None
- property hyper_params(self, **kwargs)¶
- init_weights(self, idx=slice(0, None))¶
Initializes weights for all or a subset of layers in the graph.
LSTM/GRU layers are orthogonally initialized, convolutional layers uniformly from (-0.1,0.1).
- Parameters
idx (slice) – A slice object representing the indices of layers to initialize.
- Return type
None
- classmethod load_clstm_model(cls, path)¶
Loads an CLSTM model to VGSL.
- Parameters
path (Union[str, pathlib.Path]) –
- classmethod load_model(cls, path)¶
Deserializes a VGSL model from a CoreML file.
- Parameters
path (Union[str, pathlib.Path]) – CoreML file
- Returns
A TorchVGSLModel instance.
- Raises
KrakenInvalidModelException if the model data is invalid (not a –
string, protobuf file, or without appropriate metadata). –
FileNotFoundError if the path doesn't point to a file. –
- classmethod load_pronn_model(cls, path)¶
Loads an pronn model to VGSL.
- Parameters
path (Union[str, pathlib.Path]) –
- property model_type(self)¶
- property one_channel_mode(self)¶
- resize_output(self, output_size, del_indices=None)¶
Resizes an output layer.
- Parameters
output_size (int) – New size/output channels of last layer
del_indices (list) – list of outputs to delete from layer
- Return type
None
- save_model(self, path)¶
Serializes the model into path.
- Parameters
path (str) – Target destination
- property seg_type(self)¶
- set_num_threads(self, num)¶
Sets number of OpenMP threads to use.
- Parameters
num (int) –
- Return type
None
- to(self, device)¶
- Parameters
device (Union[str, torch.device]) –
- Return type
None
- train(self)¶
Sets the model to training mode (enables dropout layers and disables softmax on CTC layers).
- Return type
None
kraken.lib.xml module¶
- kraken.lib.xml.parse_xml(filename)¶
Parses either a PageXML or ALTO file with autodetermination of the file format.
- Parameters
filename (Union[str, pathlib.Path]) – path to an XML file.
- Returns
A dict:
{'image': impath, 'lines': [{'boundary': [[x0, y0], ...], 'baseline': [[x0, y0], ...], 'text': apdjfqpf', 'tags': {'type': 'default', ...}}, ... {...}], 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
- Return type
Dict[str, Any]
- kraken.lib.xml.parse_page(filename)¶
Parses a PageXML file, returns the baselines defined in it, and loads the referenced image.
- Parameters
filename (Union[str, pathlib.Path]) – path to a PageXML file.
- Returns
A dict:
{'image': impath, 'lines': [{'boundary': [[x0, y0], ...], 'baseline': [[x0, y0], ...], 'text': apdjfqpf', 'tags': {'type': 'default', ...}}, ... {...}], 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
- Return type
Dict[str, Any]
- kraken.lib.xml.parse_alto(filename)¶
Parses an ALTO file, returns the baselines defined in it, and loads the referenced image.
- Parameters
filename (Union[str, pathlib.Path]) – path to an ALTO file.
- Returns
A dict:
{'image': impath, 'lines': [{'boundary': [[x0, y0], ...], 'baseline': [[x0, y0], ...], 'text': apdjfqpf', 'tags': {'type': 'default', ...}}, ... {...}], 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
- Return type
Dict[str, Any]
kraken.lib.codec module¶
- class kraken.lib.codec.PytorchCodec(charset, strict=False)¶
Builds a codec converting between graphemes/code points and integer label sequences.
charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically. When a mapping is manually provided the label codes need to be a prefix-free code.
As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.
- Parameters
charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.
strict – Flag indicating if encoding/decoding errors should be ignored or cause an exception.
- Raises
KrakenCodecException – If the character set contains duplicate entries or the mapping is non-singular or non-prefix-free.
- add_labels(self, charset)¶
Adds additional characters/labels to the codec.
charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.
As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.
- Parameters
charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.
- Return type
- decode(self, labels)¶
Decodes a labelling.
Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.
- Parameters
labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, confidence).
- Returns
A list of tuples (code point, start, end, confidence)
- Return type
List[Tuple[str, int, int, float]]
- encode(self, s)¶
Encodes a string into a sequence of labels.
If the code is non-singular we greedily encode the longest sequence first.
- Parameters
s (str) – Input unicode string
- Returns
Ecoded label sequence
- Raises
KrakenEncodeException – if the a subsequence is not encodable and the codec is set to strict mode.
- Return type
torch.IntTensor
- property is_valid(self)¶
Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).
- Return type
bool
- property max_label(self)¶
Returns the maximum label value.
- Return type
int
- merge(self, codec)¶
Transforms this codec (c1) into another (c2) reusing as many labels as possible.
The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.
- Parameters
codec (PytorchCodec) – PytorchCodec to merge with
- Returns
A merged codec and a list of labels that were removed from the original codec.
- Return type
Tuple[PytorchCodec, Set]
kraken.lib.train module¶
Training Schedulers¶
Training Stoppers¶
Loss and Evaluation Functions¶
Trainer¶
- class kraken.lib.train.KrakenTrainer(callbacks=None, enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, *args, **kwargs)¶
- Parameters
callbacks (Optional[Union[List[pytorch_lightning.callbacks.Callback], pytorch_lightning.callbacks.Callback]]) –
enable_progress_bar (bool) –
enable_summary (bool) –
- fit(self, *args, **kwargs)¶
kraken.lib.dataset module¶
Datasets¶
- class kraken.lib.dataset.BaselineSet(imgs=None, suffix='.path', line_width=4, im_transforms=transforms.Compose([]), mode='path', augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)¶
Dataset for training a baseline/region segmentation model.
- Parameters
imgs (Sequence[str]) –
suffix (str) –
line_width (int) –
im_transforms (Callable[[Any], torch.Tensor]) –
mode (str) –
augmentation (bool) –
valid_baselines (Sequence[str]) –
merge_baselines (Dict[str, Sequence[str]]) –
valid_regions (Sequence[str]) –
merge_regions (Dict[str, Sequence[str]]) –
- add(self, image, baselines=None, regions=None, *args, **kwargs)¶
Adds a page to the dataset.
- Parameters
im (path) – Path to the whole page image
baseline (dict) – A list containing dicts with a list of coordinates and tags [{‘baseline’: [[x0, y0], …, [xn, yn]], ‘tags’: (‘script_type’,)}, …]
regions (dict) – A dict containing list of lists of coordinates {‘region_type_0’: [[x0, y0], …, [xn, yn]]], ‘region_type_1’: …}.
image (Union[str, PIL.Image.Image]) –
baselines (List[List[List[Tuple[int, int]]]]) –
- transform(self, image, target)¶
- class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)¶
Dataset for training a line recognition model from polygonal/baseline data.
- Parameters
normalization (Optional[str]) –
whitespace_normalization (bool) –
reorder (Union[bool, str]) –
im_transforms (Callable[[Any], torch.Tensor]) –
augmentation (bool) –
- add(self, *args, **kwargs)¶
Adds a line to the dataset.
- Parameters
im (path) – Path to the whole page image
text (str) – Transcription of the line.
baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].
boundary (list) – A polygon mask for the line.
- encode(self, codec=None)¶
Adds a codec to the dataset and encodes all text lines.
Has to be run before sampling from the dataset.
- Parameters
codec (Optional[kraken.lib.codec.PytorchCodec]) –
- Return type
None
- no_encode(self)¶
Creates an unencoded dataset.
- Return type
None
- parse(self, image, text, baseline, boundary, *args, **kwargs)¶
Parses a sample for the dataset and returns it.
This function is mainly uses for parallelized loading of training data.
- Parameters
im (path) – Path to the whole page image
text (str) – Transcription of the line.
baseline (list) – A list of coordinates [[x0, y0], …, [xn, yn]].
boundary (list) – A polygon mask for the line.
image (Union[str, PIL.Image.Image]) –
- class kraken.lib.dataset.GroundTruthDataset(split=F_t.default_split, suffix='.gt.txt', normalization=None, whitespace_normalization=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)¶
Dataset for training a line recognition model.
All data is cached in memory.
- Parameters
split (Callable[[str], str]) –
suffix (str) –
normalization (Optional[str]) –
whitespace_normalization (bool) –
reorder (Union[bool, str]) –
im_transforms (Callable[[Any], torch.Tensor]) –
augmentation (bool) –
- add(self, *args, **kwargs)¶
Adds a line-image-text pair to the dataset.
- Parameters
image (str) – Input image path
- Return type
None
- encode(self, codec=None)¶
Adds a codec to the dataset and encodes all text lines.
Has to be run before sampling from the dataset.
- Parameters
codec (Optional[kraken.lib.codec.PytorchCodec]) –
- Return type
None
- no_encode(self)¶
Creates an unencoded dataset.
- Return type
None
- parse(self, image, *args, **kwargs)¶
Parses a sample for this dataset.
This is mostly used to parallelize populating the dataset.
- Parameters
image (str) – Input image path
- Return type
Dict
Helpers¶
- kraken.lib.dataset.compute_error(model, batch)¶
Computes error report from a model and a list of line image-text pairs.
- Parameters
model (kraken.lib.models.TorchSeqRecognizer) – Model used for recognition
validation_set – List of tuples (image, text) for validation
batch (Dict[str, torch.Tensor]) –
- Returns
A tuple with total number of characters and edit distance across the whole validation set.
- Return type
Tuple[int, int]
- kraken.lib.dataset.preparse_xml_data(filenames, format_type='xml', repolygonize=False)¶
Loads training data from a set of xml files.
Extracts line information from Page/ALTO xml files for training of recognition models.
- Parameters
filenames (Sequence[Union[str, pathlib.Path]]) – List of XML files.
format_type (str) – Either page, alto or xml for autodetermination.
repolygonize (bool) – (Re-)calculates polygon information using the kraken algorithm.
- Returns
text, ‘baseline’: [[x0, y0], …], ‘boundary’: [[x0, y0], …], ‘image’: PIL.Image}.
- Return type
A list of dicts {‘text’
kraken.lib.segmentation module¶
- kraken.lib.segmentation.reading_order(lines, text_direction='lr')¶
Given the list of lines (a list of 2D slices), computes the partial reading order. The output is a binary 2D array such that order[i,j] is true if line i comes before line j in reading order.
- Parameters
lines (Sequence[Tuple[slice, slice]]) –
text_direction (str) –
- Return type
numpy.ndarray
- kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)¶
Given a list of baselines and regions, calculates the correct reading order and applies it to the input.
- Parameters
lines (Sequence) – List of tuples containing the baseline and its polygonization.
regions (Sequence) – List of region polygons.
text_direction (str) – Set principal text direction for column ordering. Can be ‘lr’ or ‘rl’
- Returns
A reordered input.
- Return type
Sequence[Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]]
- kraken.lib.segmentation.denoising_hysteresis_thresh(im, low, high, sigma)¶
- kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5)¶
Vectorizes lines from a binarized array.
- Parameters
im (np.ndarray) – Array of shape (3, H, W) with the first dimension being probabilities for (start_separators, end_separators, baseline).
threshold (float) – Threshold for baseline blob detection.
min_length (int) – Minimal length of output baselines.
- Returns
[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the points of all baseline polylines.
- kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False)¶
Given a list of baselines and an input image, calculates a polygonal environment around each baseline.
- Parameters
im (PIL.Image) – grayscale input image (mode ‘L’)
baselines (sequence) – List of lists containing a single baseline per entry.
suppl_obj (sequence) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.
im_feats (numpy.array) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).
scale (tuple) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.
topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards. If set to None, no offset will be applied.
- Returns
List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.
- kraken.lib.segmentation.scale_polygonal_lines(lines, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Parameters
lines (Sequence) – List of tuples containing the baseline and it’s polygonization.
scale (float or tuple of floats) – Scaling factor
- Return type
Sequence[Tuple[List, List]]
- kraken.lib.segmentation.scale_regions(regions, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Parameters
lines (Sequence) – List of tuples containing the baseline and it’s polygonization.
scale (float or tuple of floats) – Scaling factor
regions (Sequence[Tuple[List[int], List[int]]]) –
- Return type
Sequence[Tuple[List, List]]
- kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)¶
Given a baseline, polygonal boundary, and two points on the baseline return the rectangle formed by the orthogonal cuts on that baseline segment. The resulting polygon is not garantueed to have a non-zero area.
The distance can be larger than the actual length of the baseline if the baseline endpoints are inside the bounding polygon. In that case the baseline will be extrapolated to the polygon edge.
- Parameters
baseline (list) – A polyline ((x1, y1), …, (xn, yn))
boundary (list) – A bounding polygon around the baseline (same format as baseline).
dist1 (int) – Absolute distance along the baseline of the first point.
dist2 (int) – Absolute distance along the baseline of the second point.
- Returns
A sequence of polygon points.
- Return type
List[Tuple[int, int]]
- kraken.lib.segmentation.extract_polygons(im, bounds)¶
Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.
- Parameters
im (PIL.Image.Image) – Input image
bounds (Dict[str, Any]) –
A list of dicts in baseline: ``` {‘type’: ‘baselines’,
- ’lines’: [{‘baseline’: [[x_0, y_0], … [x_n, y_n]],
’boundary’: [[x_0, y_0], … [x_n, y_n]]},
….]
or bounding box format: ``` {‘boxes’: [[x_0, y_0, x_1, y_1], …],
’text_direction’: ‘horizontal-lr’}
- Yields
The extracted subimage
- Return type
PIL.Image.Image
kraken.lib.ctc_decoder¶
- kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)¶
Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].
[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).
- Parameters
output – (C, W) shaped softmax output tensor
beam_size (int) – Size of the beam
outputs (numpy.ndarray) –
- Returns
A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.
- Return type
List[Tuple[int, int, int, float]]
- kraken.lib.ctc_decoder.greedy_decoder(outputs)¶
Translates back the network output to a label sequence using greedy/best path decoding as described in [0].
[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
- Parameters
output – (C, W) shaped softmax output tensor
outputs (numpy.ndarray) –
- Returns
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- Return type
List[Tuple[int, int, int, float]]
- kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)¶
Translates back the network output to a label sequence as the original ocropy/clstm.
Thresholds on class 0, then assigns the maximum (non-zero) class to each region.
- Parameters
output – (C, W) shaped softmax output tensor
threshold (float) – Threshold for 0 class when determining possible label locations.
outputs (numpy.ndarray) –
- Returns
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- Return type
List[Tuple[int, int, int, float]]
kraken.lib.exceptions¶
- class kraken.lib.exceptions.KrakenCodecException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenStopTrainingException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenEncodeException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenRecordException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenInvalidModelException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenInputException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenRepoException(message=None)¶
Common base class for all non-exit exceptions.
Legacy modules¶
These modules are retained for compatibility reasons or highly specialized use cases. In most cases their use is not necessary and they aren’t further developed for interoperability with new functionality, e.g. the transcription and line generation modules do not work with the baseline segmenter.
kraken.binarization module¶
- kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)¶
Performs binarization using non-linear processing.
- Parameters
im (PIL.Image.Image) – Input image
threshold (float) –
zoom (float) – Zoom for background page estimation
escale (float) – Scale for estimating a mask over the text region
border (float) – Ignore this much of the border
perc (int) – Percentage for filters
range (int) – Range for filters
low (int) – Percentile for black estimation
high (int) – Percentile for white estimation
- Returns
PIL.Image.Image containing the binarized image
- Raises
KrakenInputException – When trying to binarize an empty image.
- Return type
PIL.Image.Image
kraken.transcribe module¶
- class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)¶
- add_page(self, im, segmentation=None, records=None)¶
Adds an image to the transcription interface, optionally filling in information from a list of ocr_record objects.
- Parameters
im (PIL.Image) – Input image
segmentation (dict) – Output of the segment method.
records (list) – A list of ocr_record objects.
- write(self, fd)¶
Writes the HTML file to a file descriptor.
- Parameters
fd (File) – File descriptor (mode=’rb’) to write to.