Changelog¶

7.0.2 Security Hotfix Release¶

Released on 2026-05-04 - GitHub - PyPI

Lightning 2.6.2/2.6.3 releases from 30 April have been compromised and contain credential-stealing malware. More information can be found here.

The affected versions have been pulled from pypi so it is unlikely that fresh installations would install them. This release pins lightning to 2.6.1 as a precautionary measure.

7.0.1¶

Released on 2026-04-24 - GitHub - PyPI

What’s Changed¶

Some filtering routines for line vectorization that are intended to avoid very long vectorization passes during segmentation model training.
A small regression in the blla.segment() legacy API causing broken line segmentation
A small bugfix in line-offsetting parameter of the vectorizer
Add conversion routines between bbox and baseline container classes
Add documentation for running pytest

7.0 stable release¶

Released on 2026-04-07 - GitHub - PyPI

kraken 7.0 introduces major changes to training, inference, model handling, and extensibility.
If you are upgrading from 6.0.x as an average user, start with Breaking Changes and Command Line Behavior.

Installing the release¶

Install the latest available 7.0 release from PyPI:

$ pip install --upgrade kraken

Breaking Changes¶

Python 3.9 support was dropped. kraken now supports Python 3.10 through 3.13.
Device and precision options are now global on both kraken and ketos commands.
Training and evaluation manifest option names changed from --training-files/--evaluation-files to --training-data/--evaluation-data.
ketos train, ketos segtrain, ketos rotrain, and ketos pretrain now produce checkpoints and convert the best checkpoint to a weights file after training.
Segmentation training class filtering/merging CLI options were removed. Class mapping is now defined in YAML experiment files.
ketos segtest metrics are now computed against a configurable class mapping, and baseline detection metrics replace the older, less informative pixel accuracy/IoU-only view.
ketos compile fixed splits were removed due to a significant performance penalty. Use separate dataset files per split instead.
The API for both training and inference has been reworked extensively.
safetensors is now the default output format for trained weights.
Neural reading order models are only executed when using the new task API.
Recognition and segmentation inference accelerators now default to auto, selecting the highest-performance available device.
The CLI drivers now automatically rotate input images according to EXIF metadata. The API behavior remains as before.
The XML ALTO and PageXML parser will no longer filter out lines that lack polygonal boundaries but have baselines defined. These lines are carried through the pipeline and re-serialized in the output without changes. Plugins may use them as they see fit. Custom serialization templates will most likely require adaptation.

In practice: most existing workflows keep working after small updates, but training artifacts and API entry points changed enough that scripted pipelines and API-use will need adaptation.

Bug Fixes¶

Fixed a breaking bug in reading order models that prevented trained model weights from loading.
Cosine LR schedule in reading order training was broken. Learning rates are now correctly computed per-step.

Features and Improvements¶

A plugin system now allows easy extension of kraken functionality with new segmentation, recognition, and reading order implementations.
Persistent configuration through experiment YAML files has been added to ketos.
The new recognition API supports batching plus parallelized line extraction/processing, enabling effective GPU inference. Speedups of around 80% were observed on CPU, with even larger gains with GPU acceleration.
Character cuts on BaselineOCRRecord are now computed at initialization using a more efficient algorithm. This substantially reduces serialization overhead in the default --subline-segmentation mode.
Baseline detection metrics inspired by the Transkribus Evaluation Scheme are now computed during segmentation training. Unlike older pixel-based metrics, these scores correlate more directly with actual line detection quality.
The XML parser has been reworked for better robustness against invalid input. When PageXML files contain invalid image dimensions, kraken now attempts to read dimensions from the referenced image file. Reading-order parsing was also fully reimplemented to handle partial explicit orders and multi-level ordering more gracefully.
The training subcommands now support logging to Weights and Biases

Plugins¶

kraken can now use external implementations of layout analysis, text recognition, and reading order determination through Python entry points.

Plugins are distributed as regular Python packages. After installation, kraken discovers them automatically through entry points. Plugin model files are then used exactly like native kraken model files: pass them to --model on the CLI or load them via task classes in Python.

Example workflow with a D-FINE layout analysis plugin model:

# install plugin package
$ pip install git+https://github.com/mittagessen/dfine_kraken.git

# run layout analysis with a plugin model file
$ kraken -i page.tif page.json segment --baseline --model dfine_layout.safetensors

The same model can be loaded programmatically with SegmentationTaskModel.load_model('dfine_layout.safetensors').

Command Line Behavior¶

Inference¶

Device and precision are now global options on kraken.
Set them before subcommands:

# CPU inference in full precision
$ kraken -i page.tif page.txt --device cpu --precision 32-true \
  segment -bl ocr -m model.safetensors

# GPU inference with mixed bfloat16 precision
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors

Recognition now exposes two throughput controls:

-B/--batch-size: number of extracted line images sent per recognition forward pass.
--num-line-workers: number of CPU worker processes used to extract/preprocess line images. Use 0 to keep extraction in-process.

# conservative settings for small GPUs or CPU-only runs
$ kraken -i page.tif page.txt segment -bl ocr -m model.safetensors \
  -B 8 --num-line-workers 2

# higher-throughput GPU settings
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8

Training¶

Experiment Files¶

Managing non-trivial training configurations from CLI flags alone was difficult, especially when heavily modifying segmentation class taxonomies. To address this, ketos now supports YAML experiment files.

Pass an experiment file with --config before the command name:

$ ketos --config experiments.yml segtrain

YAML keys correspond to the internal parameter names used by the CLI.

Minimal segmentation training experiment file:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1
segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors
  line_class_mapping:
    - ['*', 3]
    - ['DefaultLine', 3]

Single experiment file containing multiple commands:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1

train:
  training_data:
    - rec_train.lst
  evaluation_data:
    - rec_val.lst
  format_type: xml
  checkpoint_path: rec_checkpoints
  weights_format: safetensors

segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors

Configurations for multiple commands can be saved in the same experiment file.

Recommendation: move non-trivial setups (class mappings, optimizer/scheduler settings, hardware defaults) into YAML so runs are reproducible and easier to review.

Training Outputs, Checkpoints, and Weights¶

For ketos train, ketos segtrain, and ketos rotrain, training now produces Lightning checkpoints (.ckpt) as the primary artifact instead of writing CoreML weights directly during training.

Checkpoint files include full training state (model weights, optimizer state, scheduler state, epoch/step counters, and serialized training config), enabling exact continuation of interrupted runs.

There are now two distinct continuation modes:

--resume restores and continues from the checkpoint’s exact previous training state. The checkpoint state is authoritative, even if command-line flags or config files specify different values.
--load keeps the previous fine-tune/start-new-run behavior. It loads weights only and starts a fresh run using current CLI/config hyperparameters.

Use --resume when you want to continue the same run.
Use --load when you want to start a new run from existing weights.

In addition to regular checkpoints, kraken now writes an emergency abort checkpoint by default (checkpoint_abort.ckpt) when a training run exits via exception (for example, a crash or a forceful abort). This gives you a recovery point even when a run terminates unexpectedly.

Because checkpoints contain much more than deployable model weights and may execute arbitrary python code on load, distribute converted weights files rather than raw checkpoints. Conversion strips training-only state and produces a distribution-safe weights artifact.

At the end of training, kraken automatically converts the best checkpoint into a weights file. You can also convert manually with ketos convert.

The default weights format is now safetensors. Compared to legacy coreml weights, safetensors supports serialization of arbitrary model types, while coreml is limited to core model methods implemented in kraken.

Use --weights-format coreml only when you explicitly need legacy compatibility.

Testing¶

Segmentation test output now includes metrics computed on vectorized baselines that correlate with segmentation quality, making model selection for line detection much easier. segtest behavior also changed with the checkpoint/weights distinction. In previous releases, test data often had to mirror post-merge/post-filter training mappings, which made evaluation cumbersome without rewriting source labels.

In short: you can now evaluate more datasets directly, with less taxonomy rewriting.

Example segtest invocation:

$ ketos --device cpu segtest -m best_0.9471.safetensors -e test_manifest.lst -f xml

Example output excerpt:

Category  Class Name    Pixel Accuracy  IOU    Object Count
aux       _start_sep    1.000           1.000  N/A
aux       _end_sep      1.000           1.000  N/A
regions   Text_Region   0.992           0.964  184
regions   Foot_Notes    0.973           0.887  36

Class         Precision  Recall  F1
Overall       0.947      0.933   0.940
DefaultLine   0.959      0.946   0.952
Marginalia    0.891      0.874   0.882

Class mappings are now stored in two forms in checkpoints and new weights files:

A full mapping with all transformations (merges/filtering) from training taxonomy to model outputs.
A canonical one-to-one mapping between label indices and class strings.

By default, evaluation uses the full mapping. Canonical mapping is used when explicitly requested and as a fallback for pre-7.0 model files. Fully custom mappings can also be defined in an experiment file.

Class mapping modes in ketos segtest:

# Use full (many-to-one) training mapping from checkpoint metadata
$ ketos segtest -m model.ckpt -e test.lst --test-class-mapping-mode full

# Use canonical one-to-one model output mapping
$ ketos segtest -m model.safetensors -e test.lst --test-class-mapping-mode canonical

# Provide explicit mapping for the test set taxonomy
$ ketos --config segtest_custom.yml segtest -m model.safetensors -e test.lst \
  --test-class-mapping-mode custom

# segtest_custom.yml
segtest:
  line_class_mapping:
    - ['DefaultLine', 3]
    - ['Running_Title', 3]
    - ['Marginal_Note', 4]
  region_class_mapping:
    - ['Text_Region', 5]
    - ['Foot_Notes', 6]

For easier debugging, ketos segtest now prints an explicit mapping between test-set classes and model classes, including clear indicators for merges, missing labels, and conflicts.

Example class taxonomy diagnostics table:

Class Mapping Diagnostics (model=full, dataset=effective)
Category   Class Name      Model Idx  Dataset Idx  Observed  Effective  Status
baselines  DefaultLine     3          3            812       812        ok
baselines  Running_Title   3          3            57        57         ok
baselines  Rubrication     4          -            14        0          ignored by dataset mapping
regions    Text_Region     5          5            184       184        ok
regions    Illustration    -          7            22        22         missing in model mapping

API¶

Configuration Classes¶

In previous versions of kraken, training and inference hyperparameters were defined in dictionaries in the default_specs module. This was error-prone and resulted in verbose code in the command line drivers.

If you maintain Python training/inference scripts, migrate to typed config classes for better defaults, clearer parameter names, and safer checkpoint serialization.

Before (6.0.x) using default_specs dictionaries:

from kraken.lib.default_specs import RECOGNITION_HYPER_PARAMS
from kraken.lib.train import RecognitionModel

hyper_params = RECOGNITION_HYPER_PARAMS.copy()
hyper_params.update({'batch_size': 8, 'lrate': 1e-3})
model = RecognitionModel(hyper_params=hyper_params, training_data=['train.lst'])

After (7.0) using typed configuration classes:

from kraken.configs import (RecognitionInferenceConfig,
                            VGSLRecognitionTrainingConfig,
                            VGSLRecognitionTrainingDataConfig)

infer_cfg = RecognitionInferenceConfig(batch_size=8,
                                       num_line_workers=4,
                                       precision='bf16-mixed')

train_cfg = VGSLRecognitionTrainingConfig(lrate=1e-3,
                                          quit='early',
                                          epochs=24)
data_cfg = VGSLRecognitionTrainingDataConfig(training_data=['train.lst'],
                                             evaluation_data=['val.lst'],
                                             format_type='xml')

Task-based API for Inference¶

blla.segment(), align.forced_align(), and rpred.rpred()/rpred.mm_rpred() have been replaced by implementation-agnostic task classes that provide better performance and flexibility. The largest gains are in text recognition, where CPU inference improves by roughly 80% through parallelization. Batching additionally enables efficient GPU utilization.

If you call legacy APIs directly, plan a migration to kraken.tasks soon. Legacy interfaces remain available for now but are deprecated.

To migrate an existing segmentation workflow, replace:

from PIL import Image
from kraken.blla import segment
from kraken.lib.vgsl import TorchVGSLModel

model = TorchVGSLModel.load_model('/path/to/segmentation/model.coreml')
im = Image.open('sample.jpg')
seg = segment(im, model=model)

with:

from PIL import Image
from kraken.tasks import SegmentationTaskModel
from kraken.configs import SegmentationInferenceConfig

segmenter = SegmentationTaskModel.load_model('/path/to/segmentation/models.safetensors')
im = Image.open('sample.jpg')
seg = segmenter.predict(im=im, config=SegmentationInferenceConfig())

For recognition. Before:

from PIL import Image
from kraken.rpred import rpred
from kraken.lib.models import load_any

net = load_any('/path/to/recognition/model.mlmodel')
for record in rpred(net, im, segmentation=seg):
    print(record)

After:

from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
for record in recognizer.predict(im=im, segmentation=seg, config=RecognitionInferenceConfig(batch_size=8, num_line_workers=4)):
    print(record)

Recognition now supports batching (batch_size in RecognitionInferenceConfig) and parallel line extraction (num_line_workers), making GPU acceleration practical.

CUDA example with explicit accelerator/device settings:

from PIL import Image
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
config = RecognitionInferenceConfig(accelerator='gpu',
                                    device=[0],
                                    precision='bf16-mixed',
                                    batch_size=64,
                                    num_line_workers=8)
for record in recognizer.predict(im=Image.open('page.tif'), segmentation=seg, config=config):
    print(record.prediction)

The new recognition API does not support tag-based multi-model recognition (rpred.mm_rpred()), which was dropped to simplify batched inference.

For forced alignment. Before:

from PIL import Image
from kraken.containers import Segmentation, BaselineLine
from kraken.align import forced_align
from kraken.lib.models import load_any

model = load_any('model.mlmodel')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(imagename='image.png', lines=[line])

aligned_segmentation = forced_align(segmentation, model) 
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

After:

from PIL import Image
from kraken.tasks import ForcedAlignmentTaskModel
from kraken.containers import Segmentation, BaselineLine
from kraken.configs import RecognitionInferenceConfig
                                                                                                                    
# Assume `model.mlmodel` is a recognition model
model = ForcedAlignmentTaskModel.load_model('model.mlmodel')
im = Image.open('image.png')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(lines=[line])
config = RecognitionInferenceConfig()
                                                                                                                    
aligned_segmentation = model.predict(im, segmentation, config)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

The old interfaces remain available but are deprecated and will be removed in kraken 8.

Training Refactor¶

The training module has been moved from kraken.lib.train to kraken.train (with reading order and pretraining modules in kraken.lib.ro/kraken.lib.pretrain). Training now uses explicit configuration objects and consistently uses LightningDataModule-derived classes.

If you run training programmatically, update imports and constructors and switch hyperparameter dicts to config objects.

Before (6.0.x) style instantiation:

from kraken.lib.train import RecognitionModel, SegmentationModel
from kraken.lib.pretrain.model import RecognitionPretrainModel
from kraken.lib.ro.model import RODataModule, ROModel

rec = RecognitionModel(hyper_params={'batch_size': 8},
                       training_data=['train.lst'],
                       evaluation_data=['val.lst'])
seg = SegmentationModel(hyper_params={'epochs': 50},
                        training_data=['seg_train.lst'],
                        evaluation_data=['seg_val.lst'])
pre = RecognitionPretrainModel(hyper_params={'mask_prob': 0.5})
ro_dm = RODataModule(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'])
ro = ROModel(feature_dim=128, class_mapping={'default': 1}, hyper_params={'epochs': 3000})

After (7.0) style instantiation:

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.lib.pretrain import PretrainDataModule, RecognitionPretrainModel
from kraken.lib.ro import RODataModule, ROModel
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTrainingDataConfig,
                            VGSLPreTrainingConfig, VGSLPreTrainingDataConfig,
                            ROTrainingConfig, ROTrainingDataConfig)

rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(training_data=['train.lst'], evaluation_data=['val.lst'], format_type='xml'))
rec_model = VGSLRecognitionModel(VGSLRecognitionTrainingConfig(epochs=24, quit='early'))

seg_dm = BLLASegmentationDataModule(BLLASegmentationTrainingDataConfig(training_data=['seg_train.lst'], evaluation_data=['seg_val.lst'], format_type='xml'))
seg_model = BLLASegmentationModel(BLLASegmentationTrainingConfig(epochs=50, quit='fixed'))

pre_dm = PretrainDataModule(VGSLPreTrainingDataConfig(training_data=['pretrain_train.lst'], evaluation_data=['pretrain_val.lst'], format_type='path'))
pre_model = RecognitionPretrainModel(VGSLPreTrainingConfig(mask_prob=0.5))

ro_dm = RODataModule(ROTrainingDataConfig(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'], format_type='xml', level='baselines'))
ro_model = ROModel(ROTrainingConfig(epochs=3000, quit='early'))

The KrakenTrainer module works as before.

In addition, separate test routines are now integrated into Lightning modules, allowing straightforward programmatic execution of the test loop for segmentation and recognition.

Example: programmatic test loop execution with KrakenTrainer.test():

KrakenTrainer.test() returns typed metric containers:

Recognition (RecognitionTestMetrics): character_counts, num_errors, cer, wer, case_insensitive_cer, confusions, scripts, insertions, deletes, substitutions
Segmentation (SegmentationTestMetrics): class_pixel_accuracy, mean_accuracy, class_iu, mean_iu, freq_iu, region_iu, bl_precision, bl_recall, bl_f1, bl_detection_per_class

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTestDataConfig)

trainer = KrakenTrainer(accelerator='cpu', devices=1, precision='32-true')

rec_model = VGSLRecognitionModel.load_from_weights('rec_best.safetensors',
                                                   VGSLRecognitionTrainingConfig())
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(test_data=['rec_test.lst'], format_type='xml'))
rec_metrics = trainer.test(rec_model, rec_dm)

seg_model = BLLASegmentationModel.load_from_weights('seg_best.safetensors',
                                                    BLLASegmentationTrainingConfig())
seg_dm = BLLASegmentationDataModule(BLLASegmentationTestDataConfig(test_data=['seg_test.lst'],
                                                                   format_type='xml',
                                                                   test_class_mapping_mode='canonical'))
seg_metrics = trainer.test(seg_model, seg_dm)

Plugin System and Model Base Classes¶

kraken now supports alternative segmentation and recognition implementations through a plugin system based on Python entry points. To be compatible, plugins must implement the interfaces defined by the abstract kraken.models.BaseModel class. kraken.models.SegmentationBaseModel and kraken.models.RecognitionBaseModel provide task-specific base interfaces.

This is primarily relevant if you are extending kraken with custom model types or distributing third-party integrations.

Rough implementation skeletons:

from torch import nn
from kraken.models import BaseModel, SegmentationBaseModel, RecognitionBaseModel

class MySegmentationModel(nn.Module, SegmentationBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['segmentation']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im): ...


class MyRecognitionModel(nn.Module, RecognitionBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['recognition']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im, segmentation): ...

To be discoverable by kraken, these classes must be registered as entry points in your setup.cfg or similar under the kraken.models group with their class name:

Example from kraken’s own setup.cfg:

[entry_points]
kraken.models =
  TorchVGSLModel = kraken.lib.vgsl:TorchVGSLModel
  Wav2Vec2Mask = kraken.lib.pretrain:Wav2Vec2Mask
  ROMLP = kraken.lib.ro:ROMLP

There is an example plugin in D-FINE kraken incorporating the D-FINE object detector for layout analysis.

Model Handling¶

kraken replaced type-specific model loaders with a modular serialization/deserialization architecture. Models can also be loaded directly via task APIs. The default serialization format is now safetensors, which supports arbitrary model types. The new API in kraken.models can read (kraken.models.load_models) and write model collections (kraken.models.write_safetensors). Model files are designed to contain multiple models (for example, layout + reading order), so these routines accept and return lists of models. You can mix “native” kraken implementations and plugin implementations in the same model file, such as a BLLA line segmentation and D-FINE region segmentation model. CoreML support remains, but only for legacy models from kraken 6 and earlier.

For most users: prefer safetensors, treat checkpoints as training artifacts, and distribute converted weights files.

Before (6.0.x) model loading:

# recognition
from kraken.lib.models import load_any
rec_model = load_any('recognition_model.mlmodel')

# segmentation
from kraken.lib.vgsl import TorchVGSLModel
seg_model = TorchVGSLModel.load_model('segmentation_model.mlmodel')

After (7.0) unified loading:

from kraken.models import load_models
from kraken.tasks import RecognitionTaskModel, SegmentationTaskModel

# load by task type
rec_models = load_models('model_bundle.safetensors', tasks=['recognition'])
seg_and_ro_models = load_models('model_bundle.safetensors', tasks=['segmentation', 'reading_order'])

# use via task API
recognizer = RecognitionTaskModel(rec_models)
segmenter = SegmentationTaskModel(seg_and_ro_models)

The new model stack explicitly distinguishes checkpoints from weights files. After training, checkpoints should be converted to weights. The universal conversion routine kraken.models.convert_models relies on additional entry points: a checkpoint LightningModule (or compatible class exposing load_from_checkpoint) and any configuration classes serialized into model weights. During conversion, checkpoints are loaded in weights_only mode. To support safe deserialization, kraken adds all classes registered under kraken.configs to PyTorch safe globals.

Minimal plugin registration in setup.cfg for checkpoint conversion:

[entry_points]
kraken.lightning_modules =
  MyVGSLLightningModule = mypkg.training:MyVGSLLightningModule
kraken.configs =
  MyTrainingConfig = mypkg.configs:MyTrainingConfig
kraken.models =
  MyModel = mypkg.models:MyModel

Checkpoint/weights conversion examples:

# CLI
$ ketos convert -i checkpoint_09-0.9431.ckpt -o model_best.safetensors

from kraken.models import convert_models, load_models
from kraken.models.convert import load_from_checkpoint

# checkpoint to weights
convert_models(['checkpoint_09-0.9431.ckpt'], 'model_best.safetensors')

# load lightning module from checkpoint (weights_only mode)
module = load_from_checkpoint('checkpoint_09-0.9431.ckpt')
net = module.net

# load converted weights
models = load_models('model_best.safetensors')

7.0.0b7: 7th beta release¶

Released on 2026-03-23 - GitHub - PyPI

What’s Changed¶

Makes serializer and default shipped templates compatible with lines that do not have boundaries defined (the parser can emit them now)
Enables deserialization in weights_only mode of training objects that have been initialized with container classes.
Determines minimum kraken versions for running models from classes not weights.

7.0.0b6: 6th beta release¶

Released on 2026-03-04 - GitHub - PyPI

What’s changed¶

Some safetensor load validations were overly strict.

7.0.0b5: 7.0 5th beta release¶

Released on 2026-02-25 - GitHub - PyPI

What’s Changed¶

Sets inversion in recognizer input to previous implementation
Correct cosine LR schedule in rotrain
validates more metadata during model loading
fixes hybrid neural reading order computation
Makes _merge_segmentations() work with bounding box segmentations
Corrects ketos segtest class indices and makes F1 score macro as precision/recall.
Sets metrics in user_metadata field of the checkpoint
pageseg.segment() now always returns a Segmentation object
Allows setting seed to 0 instead of treating it as a random seed request.

7.0.0b4: 7.0 4th beta release¶

Released on 2026-02-23 - GitHub - PyPI

What’s changed¶

Fixes a regression in segmentation training
Updates in repository model handling/fetching
Pins safetensors to 0.6.2
Adds WandB logging support

7.0.0b3: 7.0 3rd beta release¶

Released on 2026-02-16 - GitHub - PyPI

What’s Changed¶

Rotate images according to EXIF metadata
Port build system to pyproject.toml/hatchling
Enables the model loading version compatibility check
Enables segmentation with region-only models
Fixes an incorrect metric tracking direction in reading order training.

7.0.0b2: 7.0 beta release¶

Released on 2026-02-15 - GitHub - PyPI

kraken 7.0 introduces major changes to training, inference, model handling, and extensibility.
If you are upgrading from 6.0.x as an average user, start with Breaking Changes and Command Line Behavior.

Installing the Beta¶

Install the latest available 7.0 pre-release from PyPI:

$ pip install --upgrade --pre kraken

Install this specific beta explicitly:

$ pip install --upgrade "kraken==7.0b1"

Breaking Changes¶

Python 3.9 support was dropped. kraken now supports Python 3.10 through 3.13.
Device and precision options are now global on both kraken and ketos commands.
Training and evaluation manifest option names changed from --training-files/--evaluation-files to --training-data/--evaluation-data.
ketos train, ketos segtrain, ketos rotrain, and ketos pretrain now produce checkpoints and convert the best checkpoint to a weights file after training.
Segmentation training class filtering/merging CLI options were removed. Class mapping is now defined in YAML experiment files.
ketos segtest metrics are now computed against a configurable class mapping, and baseline detection metrics replace the older, less informative pixel accuracy/IoU-only view.
ketos compile fixed splits were removed due to a significant performance penalty. Use separate dataset files per split instead.
The API for both training and inference has been reworked extensively.
safetensors is now the default output format for trained weights.
Neural reading order models are only executed when using the new task API.
Recognition and segmentation inference accelerators now default to auto, selecting the highest-performance available device.

In practice: most existing workflows keep working after small updates, but training artifacts and API entry points changed enough that scripted pipelines and API-use will need adaptation.

Bug Fixes¶

Fixed a breaking bug in reading order models that prevented trained model weights from loading.

Features and Improvements¶

A plugin system now allows easy extension of kraken functionality with new segmentation, recognition, and reading order implementations.
Persistent configuration through experiment YAML files has been added to ketos.
The new recognition API supports batching plus parallelized line extraction/processing, enabling effective GPU inference. Speedups of around 80% were observed on CPU, with even larger gains with GPU acceleration.
Character cuts on BaselineOCRRecord are now computed at initialization using a more efficient algorithm. This substantially reduces serialization overhead in the default --subline-segmentation mode.
Baseline detection metrics inspired by the Transkribus Evaluation Scheme are now computed during segmentation training. Unlike older pixel-based metrics, these scores correlate more directly with actual line detection quality.
The XML parser has been reworked for better robustness against invalid input. When PageXML files contain invalid image dimensions, kraken now attempts to read dimensions from the referenced image file. Reading-order parsing was also fully reimplemented to handle partial explicit orders and multi-level ordering more gracefully.

Plugins¶

kraken can now use external implementations of layout analysis, text recognition, and reading order determination through Python entry points.

Plugins are distributed as regular Python packages. After installation, kraken discovers them automatically through entry points. Plugin model files are then used exactly like native kraken model files: pass them to --model on the CLI or load them via task classes in Python.

Example workflow with a D-FINE layout analysis plugin model:

# install plugin package
$ pip install git+https://github.com/mittagessen/dfine_kraken.git

# run layout analysis with a plugin model file
$ kraken -i page.tif page.json segment --baseline --model dfine_layout.safetensors

The same model can be loaded programmatically with SegmentationTaskModel.load_model('dfine_layout.safetensors').

Command Line Behavior¶

Inference¶

Device and precision are now global options on kraken.
Set them before subcommands:

# CPU inference in full precision
$ kraken -i page.tif page.txt --device cpu --precision 32-true \
  segment -bl ocr -m model.safetensors

# GPU inference with mixed bfloat16 precision
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors

Recognition now exposes two throughput controls:

-B/--batch-size: number of extracted line images sent per recognition forward pass.
--num-line-workers: number of CPU worker processes used to extract/preprocess line images. Use 0 to keep extraction in-process.

# conservative settings for small GPUs or CPU-only runs
$ kraken -i page.tif page.txt segment -bl ocr -m model.safetensors \
  -B 8 --num-line-workers 2

# higher-throughput GPU settings
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
  segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8

Training¶

Experiment Files¶

Managing non-trivial training configurations from CLI flags alone was difficult, especially when heavily modifying segmentation class taxonomies. To address this, ketos now supports YAML experiment files.

Pass an experiment file with --config before the command name:

$ ketos --config experiments.yml segtrain

YAML keys correspond to the internal parameter names used by the CLI.

Minimal segmentation training experiment file:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1
segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors
  line_class_mapping:
    - ['*', 3]
    - ['DefaultLine', 3]

Single experiment file containing multiple commands:

precision: 32-true
device: auto
num_workers: 16
num_threads: 1

train:
  training_data:
    - rec_train.lst
  evaluation_data:
    - rec_val.lst
  format_type: xml
  checkpoint_path: rec_checkpoints
  weights_format: safetensors

segtrain:
  training_data:
    - seg_train.lst
  evaluation_data:
    - seg_val.lst
  format_type: xml
  checkpoint_path: seg_checkpoints
  weights_format: safetensors

Configurations for multiple commands can be saved in the same experiment file.

Recommendation: move non-trivial setups (class mappings, optimizer/scheduler settings, hardware defaults) into YAML so runs are reproducible and easier to review.

Training Outputs, Checkpoints, and Weights¶

For ketos train, ketos segtrain, and ketos rotrain, training now produces Lightning checkpoints (.ckpt) as the primary artifact instead of writing CoreML weights directly during training.

Checkpoint files include full training state (model weights, optimizer state, scheduler state, epoch/step counters, and serialized training config), enabling exact continuation of interrupted runs.

There are now two distinct continuation modes:

--resume restores and continues from the checkpoint’s exact previous training state. The checkpoint state is authoritative, even if command-line flags or config files specify different values.
--load keeps the previous fine-tune/start-new-run behavior. It loads weights only and starts a fresh run using current CLI/config hyperparameters.

Use --resume when you want to continue the same run.
Use --load when you want to start a new run from existing weights.

In addition to regular checkpoints, kraken now writes an emergency abort checkpoint by default (checkpoint_abort.ckpt) when a training run exits via exception (for example, a crash or a forceful abort). This gives you a recovery point even when a run terminates unexpectedly.

Because checkpoints contain much more than deployable model weights and may execute arbitrary python code on load, distribute converted weights files rather than raw checkpoints. Conversion strips training-only state and produces a distribution-safe weights artifact.

At the end of training, kraken automatically converts the best checkpoint into a weights file. You can also convert manually with ketos convert.

The default weights format is now safetensors. Compared to legacy coreml weights, safetensors supports serialization of arbitrary model types, while coreml is limited to core model methods implemented in kraken.

Use --weights-format coreml only when you explicitly need legacy compatibility.

Testing¶

Segmentation test output now includes metrics computed on vectorized baselines that correlate with segmentation quality, making model selection for line detection much easier. segtest behavior also changed with the checkpoint/weights distinction. In previous releases, test data often had to mirror post-merge/post-filter training mappings, which made evaluation cumbersome without rewriting source labels.

In short: you can now evaluate more datasets directly, with less taxonomy rewriting.

Example segtest invocation:

$ ketos --device cpu segtest -m best_0.9471.safetensors -e test_manifest.lst -f xml

Example output excerpt:

Category  Class Name    Pixel Accuracy  IOU    Object Count
aux       _start_sep    1.000           1.000  N/A
aux       _end_sep      1.000           1.000  N/A
regions   Text_Region   0.992           0.964  184
regions   Foot_Notes    0.973           0.887  36

Class         Precision  Recall  F1
Overall       0.947      0.933   0.940
DefaultLine   0.959      0.946   0.952
Marginalia    0.891      0.874   0.882

Class mappings are now stored in two forms in checkpoints and new weights files:

A full mapping with all transformations (merges/filtering) from training taxonomy to model outputs.
A canonical one-to-one mapping between label indices and class strings.

By default, evaluation uses the full mapping. Canonical mapping is used when explicitly requested and as a fallback for pre-7.0 model files. Fully custom mappings can also be defined in an experiment file.

Class mapping modes in ketos segtest:

# Use full (many-to-one) training mapping from checkpoint metadata
$ ketos segtest -m model.ckpt -e test.lst --test-class-mapping-mode full

# Use canonical one-to-one model output mapping
$ ketos segtest -m model.safetensors -e test.lst --test-class-mapping-mode canonical

# Provide explicit mapping for the test set taxonomy
$ ketos --config segtest_custom.yml segtest -m model.safetensors -e test.lst \
  --test-class-mapping-mode custom

# segtest_custom.yml
segtest:
  line_class_mapping:
    - ['DefaultLine', 3]
    - ['Running_Title', 3]
    - ['Marginal_Note', 4]
  region_class_mapping:
    - ['Text_Region', 5]
    - ['Foot_Notes', 6]

For easier debugging, ketos segtest now prints an explicit mapping between test-set classes and model classes, including clear indicators for merges, missing labels, and conflicts.

Example class taxonomy diagnostics table:

Class Mapping Diagnostics (model=full, dataset=effective)
Category   Class Name      Model Idx  Dataset Idx  Observed  Effective  Status
baselines  DefaultLine     3          3            812       812        ok
baselines  Running_Title   3          3            57        57         ok
baselines  Rubrication     4          -            14        0          ignored by dataset mapping
regions    Text_Region     5          5            184       184        ok
regions    Illustration    -          7            22        22         missing in model mapping

API¶

Configuration Classes¶

In previous versions of kraken, training and inference hyperparameters were defined in dictionaries in the default_specs module. This was error-prone and resulted in verbose code in the command line drivers.

If you maintain Python training/inference scripts, migrate to typed config classes for better defaults, clearer parameter names, and safer checkpoint serialization.

Before (6.0.x) using default_specs dictionaries:

from kraken.lib.default_specs import RECOGNITION_HYPER_PARAMS
from kraken.lib.train import RecognitionModel

hyper_params = RECOGNITION_HYPER_PARAMS.copy()
hyper_params.update({'batch_size': 8, 'lrate': 1e-3})
model = RecognitionModel(hyper_params=hyper_params, training_data=['train.lst'])

After (7.0) using typed configuration classes:

from kraken.configs import (RecognitionInferenceConfig,
                            VGSLRecognitionTrainingConfig,
                            VGSLRecognitionTrainingDataConfig)

infer_cfg = RecognitionInferenceConfig(batch_size=8,
                                       num_line_workers=4,
                                       precision='bf16-mixed')

train_cfg = VGSLRecognitionTrainingConfig(lrate=1e-3,
                                          quit='early',
                                          epochs=24)
data_cfg = VGSLRecognitionTrainingDataConfig(training_data=['train.lst'],
                                             evaluation_data=['val.lst'],
                                             format_type='xml')

Task-based API for Inference¶

blla.segment(), align.forced_align(), and rpred.rpred()/rpred.mm_rpred() have been replaced by implementation-agnostic task classes that provide better performance and flexibility. The largest gains are in text recognition, where CPU inference improves by roughly 80% through parallelization. Batching additionally enables efficient GPU utilization.

If you call legacy APIs directly, plan a migration to kraken.tasks soon. Legacy interfaces remain available for now but are deprecated.

To migrate an existing segmentation workflow, replace:

from PIL import Image
from kraken.blla import segment
from kraken.lib.vgsl import TorchVGSLModel

model = TorchVGSLModel.load_model('/path/to/segmentation/model.coreml')
im = Image.open('sample.jpg')
seg = segment(im, model=model)

with:

from PIL import Image
from kraken.tasks import SegmentationTaskModel
from kraken.configs import SegmentationInferenceConfig

segmenter = SegmentationTaskModel.load_model('/path/to/segmentation/models.safetensors')
im = Image.open('sample.jpg')
seg = segmenter.predict(im=im, config=SegmentationInferenceConfig())

For recognition. Before:

from PIL import Image
from kraken.rpred import rpred
from kraken.lib.models import load_any

net = load_any('/path/to/recognition/model.mlmodel')
for record in rpred(net, im, segmentation=seg):
    print(record)

After:

from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
for record in recognizer.predict(im=im, segmentation=seg, config=RecognitionInferenceConfig(batch_size=8, num_line_workers=4)):
    print(record)

Recognition now supports batching (batch_size in RecognitionInferenceConfig) and parallel line extraction (num_line_workers), making GPU acceleration practical.

CUDA example with explicit accelerator/device settings:

from PIL import Image
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig

recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
config = RecognitionInferenceConfig(accelerator='gpu',
                                    device=[0],
                                    precision='bf16-mixed',
                                    batch_size=64,
                                    num_line_workers=8)
for record in recognizer.predict(im=Image.open('page.tif'), segmentation=seg, config=config):
    print(record.prediction)

The new recognition API does not support tag-based multi-model recognition (rpred.mm_rpred()), which was dropped to simplify batched inference.

For forced alignment. Before:

from PIL import Image
from kraken.containers import Segmentation, BaselineLine
from kraken.align import forced_align
from kraken.lib.models import load_any

model = load_any('model.mlmodel')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(imagename='image.png', lines=[line])

aligned_segmentation = forced_align(segmentation, model) 
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

After:

from PIL import Image
from kraken.tasks import ForcedAlignmentTaskModel
from kraken.containers import Segmentation, BaselineLine
from kraken.configs import RecognitionInferenceConfig
                                                                                                                    
# Assume `model.mlmodel` is a recognition model
model = ForcedAlignmentTaskModel.load_model('model.mlmodel')
im = Image.open('image.png')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(lines=[line])
config = RecognitionInferenceConfig()
                                                                                                                    
aligned_segmentation = model.predict(im, segmentation, config)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)

The old interfaces remain available but are deprecated and will be removed in kraken 8.

Training Refactor¶

The training module has been moved from kraken.lib.train to kraken.train (with reading order and pretraining modules in kraken.lib.ro/kraken.lib.pretrain). Training now uses explicit configuration objects and consistently uses LightningDataModule-derived classes.

If you run training programmatically, update imports and constructors and switch hyperparameter dicts to config objects.

Before (6.0.x) style instantiation:

from kraken.lib.train import RecognitionModel, SegmentationModel
from kraken.lib.pretrain.model import RecognitionPretrainModel
from kraken.lib.ro.model import RODataModule, ROModel

rec = RecognitionModel(hyper_params={'batch_size': 8},
                       training_data=['train.lst'],
                       evaluation_data=['val.lst'])
seg = SegmentationModel(hyper_params={'epochs': 50},
                        training_data=['seg_train.lst'],
                        evaluation_data=['seg_val.lst'])
pre = RecognitionPretrainModel(hyper_params={'mask_prob': 0.5})
ro_dm = RODataModule(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'])
ro = ROModel(feature_dim=128, class_mapping={'default': 1}, hyper_params={'epochs': 3000})

After (7.0) style instantiation:

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.lib.pretrain import PretrainDataModule, RecognitionPretrainModel
from kraken.lib.ro import RODataModule, ROModel
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTrainingDataConfig,
                            VGSLPreTrainingConfig, VGSLPreTrainingDataConfig,
                            ROTrainingConfig, ROTrainingDataConfig)

rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(training_data=['train.lst'], evaluation_data=['val.lst'], format_type='xml'))
rec_model = VGSLRecognitionModel(VGSLRecognitionTrainingConfig(epochs=24, quit='early'))

seg_dm = BLLASegmentationDataModule(BLLASegmentationTrainingDataConfig(training_data=['seg_train.lst'], evaluation_data=['seg_val.lst'], format_type='xml'))
seg_model = BLLASegmentationModel(BLLASegmentationTrainingConfig(epochs=50, quit='fixed'))

pre_dm = PretrainDataModule(VGSLPreTrainingDataConfig(training_data=['pretrain_train.lst'], evaluation_data=['pretrain_val.lst'], format_type='path'))
pre_model = RecognitionPretrainModel(VGSLPreTrainingConfig(mask_prob=0.5))

ro_dm = RODataModule(ROTrainingDataConfig(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'], format_type='xml', level='baselines'))
ro_model = ROModel(ROTrainingConfig(epochs=3000, quit='early'))

The KrakenTrainer module works as before.

In addition, separate test routines are now integrated into Lightning modules, allowing straightforward programmatic execution of the test loop for segmentation and recognition.

Example: programmatic test loop execution with KrakenTrainer.test():

KrakenTrainer.test() returns typed metric containers:

Recognition (RecognitionTestMetrics): character_counts, num_errors, cer, wer, case_insensitive_cer, confusions, scripts, insertions, deletes, substitutions
Segmentation (SegmentationTestMetrics): class_pixel_accuracy, mean_accuracy, class_iu, mean_iu, freq_iu, region_iu, bl_precision, bl_recall, bl_f1, bl_detection_per_class

from kraken.train import (KrakenTrainer,
                          VGSLRecognitionDataModule, VGSLRecognitionModel,
                          BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
                            BLLASegmentationTrainingConfig, BLLASegmentationTestDataConfig)

trainer = KrakenTrainer(accelerator='cpu', devices=1, precision='32-true')

rec_model = VGSLRecognitionModel.load_from_weights('rec_best.safetensors',
                                                   VGSLRecognitionTrainingConfig())
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(test_data=['rec_test.lst'], format_type='xml'))
rec_metrics = trainer.test(rec_model, rec_dm)

seg_model = BLLASegmentationModel.load_from_weights('seg_best.safetensors',
                                                    BLLASegmentationTrainingConfig())
seg_dm = BLLASegmentationDataModule(BLLASegmentationTestDataConfig(test_data=['seg_test.lst'],
                                                                   format_type='xml',
                                                                   test_class_mapping_mode='canonical'))
seg_metrics = trainer.test(seg_model, seg_dm)

Plugin System and Model Base Classes¶

kraken now supports alternative segmentation and recognition implementations through a plugin system based on Python entry points. To be compatible, plugins must implement the interfaces defined by the abstract kraken.models.BaseModel class. kraken.models.SegmentationBaseModel and kraken.models.RecognitionBaseModel provide task-specific base interfaces.

This is primarily relevant if you are extending kraken with custom model types or distributing third-party integrations.

Rough implementation skeletons:

from torch import nn
from kraken.models import BaseModel, SegmentationBaseModel, RecognitionBaseModel

class MySegmentationModel(nn.Module, SegmentationBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['segmentation']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im): ...


class MyRecognitionModel(nn.Module, RecognitionBaseModel):
    _kraken_min_version = '7.0.0'
    model_type = ['recognition']

    def prepare_for_inference(self, config): self.eval()
    def predict(self, im, segmentation): ...

To be discoverable by kraken, these classes must be registered as entry points in your setup.cfg or similar under the kraken.models group with their class name:

Example from kraken’s own setup.cfg:

[entry_points]
kraken.models =
  TorchVGSLModel = kraken.lib.vgsl:TorchVGSLModel
  Wav2Vec2Mask = kraken.lib.pretrain:Wav2Vec2Mask
  ROMLP = kraken.lib.ro:ROMLP

There is an example plugin in D-FINE kraken incorporating the D-FINE object detector for layout analysis.

Model Handling¶

kraken replaced type-specific model loaders with a modular serialization/deserialization architecture. Models can also be loaded directly via task APIs. The default serialization format is now safetensors, which supports arbitrary model types. The new API in kraken.models can read (kraken.models.load_models) and write model collections (kraken.models.write_safetensors). Model files are designed to contain multiple models (for example, layout + reading order), so these routines accept and return lists of models. You can mix “native” kraken implementations and plugin implementations in the same model file, such as a BLLA line segmentation and D-FINE region segmentation model. CoreML support remains, but only for legacy models from kraken 6 and earlier.

For most users: prefer safetensors, treat checkpoints as training artifacts, and distribute converted weights files.

Before (6.0.x) model loading:

# recognition
from kraken.lib.models import load_any
rec_model = load_any('recognition_model.mlmodel')

# segmentation
from kraken.lib.vgsl import TorchVGSLModel
seg_model = TorchVGSLModel.load_model('segmentation_model.mlmodel')

After (7.0) unified loading:

from kraken.models import load_models
from kraken.tasks import RecognitionTaskModel, SegmentationTaskModel

# load by task type
rec_models = load_models('model_bundle.safetensors', tasks=['recognition'])
seg_and_ro_models = load_models('model_bundle.safetensors', tasks=['segmentation', 'reading_order'])

# use via task API
recognizer = RecognitionTaskModel(rec_models)
segmenter = SegmentationTaskModel(seg_and_ro_models)

The new model stack explicitly distinguishes checkpoints from weights files. After training, checkpoints should be converted to weights. The universal conversion routine kraken.models.convert_models relies on additional entry points: a checkpoint LightningModule (or compatible class exposing load_from_checkpoint) and any configuration classes serialized into model weights. During conversion, checkpoints are loaded in weights_only mode. To support safe deserialization, kraken adds all classes registered under kraken.configs to PyTorch safe globals.

Minimal plugin registration in setup.cfg for checkpoint conversion:

[entry_points]
kraken.lightning_modules =
  MyVGSLLightningModule = mypkg.training:MyVGSLLightningModule
kraken.configs =
  MyTrainingConfig = mypkg.configs:MyTrainingConfig
kraken.models =
  MyModel = mypkg.models:MyModel

Checkpoint/weights conversion examples:

# CLI
$ ketos convert -i checkpoint_09-0.9431.ckpt -o model_best.safetensors

from kraken.models import convert_models, load_models
from kraken.models.convert import load_from_checkpoint

# checkpoint to weights
convert_models(['checkpoint_09-0.9431.ckpt'], 'model_best.safetensors')

# load lightning module from checkpoint (weights_only mode)
module = load_from_checkpoint('checkpoint_09-0.9431.ckpt')
net = module.net

# load converted weights
models = load_models('model_best.safetensors')

6.0.4: Hotfix release for blla.segment()¶

Released on 2026-02-13 - GitHub - PyPI

Corrects a regression where blla.segment() would not load a default model when none was explicitly defined on the CLI.

6.0.3¶

Released on 2025-12-13 - GitHub - PyPI

Bug Fixes¶

Fixes a regression in tag-based recognition.
Pin rich to below 14.1 and relax pytorch pin to 2.9.x.
Remove --device option from ketos rotrain and use the value from the base command instead.
Fixes small typos in documentation (Stefan Weil) #741

6.0.2 hotfix release¶

Released on 2025-12-11 - GitHub - PyPI

Another hotfix release. blla.segment() would access incorrect fields of the new tags data structure.

6.0.1 hotfix release¶

Released on 2025-12-11 - GitHub - PyPI

This is a hotfix release pinning click to below 8.3 as flag option parsing is inconsistent in later releases.

6.0.0¶

Released on 2025-09-03 - GitHub - PyPI

The 6.0 release does not introduce any major new features but changes the behavior of multiple components and introduces non-backward-compatible API changes, necessitating a major release.

Backward-incompatible changes¶

Ketos subcommand options that were shared by many commands, namely --device, --workers, --precision, and --threads have been moved to the main command.

For ketos compile:

ketos compile --workers 16 .... # OLD
ketos --workers 16 compile ... # NEW

For ketos train/segtrain/rotrain/test/segtest/pretrain:

ketos train -d cuda:0 --workers 16 --threads 23 --precision bf16-true # OLD
ketos -d cuda:0 --workers 16 --threads 23 --precision bf16-true # NEW

Tag parsing has changed which not only affects the internal data structures of the container classes but also the user-facing command line interface. The mapping of line tags to recognition models in the kraken ocr’s -m argument now always uses the resolved type of the line. The resolved type is determined for ALTO files by any tag reference pointing to a tag element either with a TYPE attribute with value type or no TYPE attribute at all. For PageXML files this is determined by the custom string structure {type: $value;}.

These changes are in preparation for the eventual removal of per-tag-recognition as it prevents optimizing recognition throughput with batching.

New features¶

The model repository has seen a major upgrade with a new metadata schema called HTRMoPo that allows uploading more model types (segmentation, recognition, reading order, …) and includes support for informative huggingface-style model cards. The new implementation also caches the model repository state for faster querying, has support for versioned models, and allows filtering of output based on various metadata fields. Interaction with the repository using the command line drivers is documented here.

The API and command line driver for reading order model training (ketos rotrain) now supports the same filtering and merging options as the segmentation training tools which makes it easier to train RO models when the corresponding segmentation model has been trained using these options.

Testing recognition models with ketos test now also computes a case-insensitive character error rate. (Thanks Weslley Oliveira!).

Per-step and average epoch training loss is now printed on the progress bars of all training tools (ketos pretrain, ketos rotrain, ketos segtrain, ketos train).

The contrib/repolygonize.py now allows setting the scale of the polygonization input with the --scale option. (Thanks Weslley Oliveira!)

contrib/set_seg_options.py can set the segmentation model option for line location to centerline as well.

A new contrib/add_neural_ro.py script can be used to add a new reading order generated by a neural reading order model to an existing XML facsimile.

A softmax temperature option has been added to smooth out the confidence distribution of the character confidences of text recognition output. The option is available as an argument to TorchSeqRecognizer and the --temperature setting on the kraken ocr subcommand.

Removed features¶

The synthetic line generation tools were removed as they were only useful for training legacy line-strip recognition models. The recommended alternative that is compatible with baseline-style models is the new pangoline tool. A short description how to prepare kraken training data with it is available here in the docs.

Likewise, the legacy HTML file-based transcription environment was removed as it never supported transcription of baseline segmentation data. eScriptorium is the suggested replacement.

Installation through anaconda is gone. Due to coreml not being maintained in conda-forge it has not been possible to do a pure conda installation without side-loading packages through pip for a long while.

Misc. Changes¶

All valid floating point precision values known to pytorch lightning can now be used with the --precision option of ketos.

scripts.json has been updated to include the new scripts encoded by Unicode 16.

The reading order training code has been refactored.

Region filtering now supports types containing $.

contrib/extract_lines.py writes output always as RGB images.

The pytorch pin has been relaxed to accept versions between 2.4.0 and 2.7.x.

API changes¶

The XML parsing, container classes, and tagging have been revamped, introducing a number of changes.

Tags¶

Tags on the container classes (Region, BaselineLine, BboxLine) were previously a simple dictionary containing string keys and values which was less expressive than the Transkribus-style custom strings mapping an identifier to one or more dictionaries, e.g. language {id: eng; name: English} language {id: heb; name: Hebrew}. With the current release all tags are in dict-list-of-dicts format, taking the example above {'language': [{'id': 'eng', 'name': 'English'}, {'id': 'heb', 'name': 'Hebrew'}]}, no matter their source (PageXML or ALTO files). Tags parsed from ALTO’s tag reference system, which only allows serialization of key-value paris, are expanded by introducing a dummy key ‘type’ in the value dicts, i.e.

<Tags>
<OtherTag> ID="foo" LABEL="heb" TYPE="language"/>
...
</Tags>
...
<TextLine ... TAGREFS="foo">...

will have a value of the tags property of the parsed line {'language': [{'type': 'heb'}]}. When multiple tags with the same TYPE are referenced, the value dicts will be aggregated into a list (PageXML custom string are treated analogously):

<Tags>
<OtherTag> ID="foo" LABEL="heb" TYPE="language"/>
<OtherTag> ID="foo" LABEL="eng" TYPE="language"/>
...
</Tags>
...
<TextLine ... TAGREFS="foo">...

will be parsed as {'language': [{'type': 'heb'}, {'type': 'eng']}. The TYPE field on ALTO files is not obligatory, if it is missing the TYPE will be treated as having the value type.

Baseline and Bbox XML parsing¶

The XMLPage class is now able to parse input facsimile files as either containing bounding-box or baselines by changing the value of the linetype argument:

> from kraken.lib.xml import XMLPage
> doc = XMLPage('alto.xml', linetype='baselines').to_container()
> print(doc.type)
baselines
> doc.lines[0]
BaselineLine(id='eSc_line_192895', baseline=[(848, 682), (934, 678), (1027, 689), (1214, 696), (2731, 700)], boundary=[(844, 678), (851, 635), (1038, 649), (1053, 635), (1110, 635), (1182, 664), (1311, 656), (1351, 635), (1365, 649), (1469, 635), (1505, 664), (1552, 646), (1570, 660), (1599, 635), (1685, 667), (1746, 653), (1786, 664), (1822, 639), (1947, 667), (2199, 667), (2289, 639), (2346, 667), (2386, 649), (2422, 667), (2497, 667), (2526, 642), (2619, 664), (2637, 649), (2670, 667), (2716, 656), (2727, 696), (2716, 761), (2673, 761), (2645, 735), (2555, 739), (2537, 753), (2508, 743), (2490, 761), (2458, 735), (2393, 757), (2364, 739), (2267, 761), (2163, 743), (2080, 761), (2005, 739), (1969, 761), (1929, 739), (1865, 757), (1807, 739), (1764, 761), (1732, 739), (1602, 761), (1530, 743), (1509, 753), (1484, 735), (1459, 757), (1405, 743), (1351, 757), (1304, 735), (1283, 757), (1232, 757), (1193, 732), (1168, 757), (1124, 757), (1067, 732), (1045, 746), (999, 732), (848, 732)], text="בשאול וגו' ˙ אם יחבאו בראש הכרמל וגו' אם ילכו בשבי וגו' אין חשך ואין [צל']", base_dir='L', type='baselines', imagename=None, tags=None, split=None, regions=['eSc_textblock_10523'], language=['iai'])
> doc = XMLPage('alto.xml', linetype='bbox').to_container()
> print(doc.type)
bbox
> doc.lines[0]
BBoxLine(id='eSc_line_192895', bbox=(844, 635, 2727, 761), text="בשאול וגו' ˙ אם יחבאו בראש הכרמל וגו' אם ילכו בשבי וגו' אין חשך ואין [צל']", base_dir='L', type='bbox', imagename=None, tags=None, split=None, regions=['eSc_textblock_10523'], text_direction='horizontal-lr', language=['iai'])

This simplifies using text recognition models trained on bounding box data with input data in XML format. Instead of manually creating the appropriate Segmentation object it is now possible to just run the parser with linetype set and hand the container to rpred.rpred().

When the source files are PageXML, the bounding boxes around lines are computed from the maximum extend of the line bounding polygon. For ALTO files the bounding boxes are taken from the HPOS, VPOS, HEIGHT, WIDTH attributes which means that no bounding polygons need to be defined in a Shape element.

Language parsing¶

In addition, it now parses language information in source files, Region/BBoxLine/BaselineLine classes have a new language property containing a list of language identifiers, and the standard output format templates serialize the field correctly. For PageXML files these identifiers are validated to the ISO639-3 standard, for ALTO files the values are gathered as is. Inheritance from the page and region level is handled correctly but the notion of primaryLanguage and secondaryLanguage attributes is lost during parsing as they are merged with any language identifiers in the custom string. For ALTO files language information is taken from the LANG attribute and any references to tags that have a type of language. The current uses of this system are limited but are in preparation for integration of the new party recognizer.

Hyperparameter register¶

lib/register.py is a new module that contains valid values for hyperparameters like optimizers, schedulers, precision, and stoppers.

Bugfixes¶

0053402: Correct return value for image load error in extract line & line path (rlskoeser) #665
d356587: Add a test for image error handling (rlskoeser) #665
bbf4336: Fix Augmentation Issues (Weslley Oliveira) #673
b435c77: Bug fix for class determination in RO dataset
8a13475: Fix a situation where unicodedata.category is not covering up enough (Thibault Clérice) #692
9a218ce: Prefix uuids with _ to make them valid xml:ids

Among many others.

5.2.9 - Bugfix release¶

Released on 2024-08-27 - GitHub - PyPI

What’s Changed¶

Pins python-bidi to a version that supports our internal data structure mangling
Fixes a small regression in pretraining
Various PageXML serialization improvements
ketos now prints a helpful message when trying to use a binary file with the -t/-e options expecting manifest files
Fixes serialization of dummy boxes by @PonteIneptique in https://github.com/mittagessen/kraken/pull/612
Update alto to not produce Polygon tag on default blocks by @PonteIneptique in https://github.com/mittagessen/kraken/pull/620
corrected mask of patch by @saiprabhath2002 in https://github.com/mittagessen/kraken/pull/617

New Contributors¶

@saiprabhath2002 made their first contribution in https://github.com/mittagessen/kraken/pull/617

Full Changelog: https://github.com/mittagessen/kraken/compare/5.2.5…5.2.9

5.2.5 Bugfix release¶

Released on 2024-05-23 - GitHub - PyPI

Fixes XML serialization of segmentation results (#597)
Removes regression in polygonization code introduced with performance enhancements (#605)
extract_polygons() now raises an exception when processing baselines < 5px in length (#606)
Various small improvements to contrib/segmentation_overlay.py
ketos compile progress bar now displays elapsed/remaining time (#504)

5.2.4: Hotfix release¶

Released on 2024-05-09 - GitHub - PyPI

Fixes a regression in container-based binary dataset building
Fixes spurious updates of validation metrics after sanity checking

5.2.3: Hotfix for segmentation training¶

Released on 2024-05-09 - GitHub - PyPI

What’s Changed¶

Hotfix for segmentation training

5.2.2: Hotfix for no_segmentation mode recognition¶

Released on 2024-04-30 - GitHub - PyPI

Hotfix release fixing a regression in no_segmentation recognition.

5.2.1 hotfix release¶

Released on 2024-04-22 - GitHub - PyPI

This release contains two small fixes for a regression related to bumping lightning up to 2.2 and a crash in Segmentation instantiation occurring when the first region type does not contain a region/dict.

5.2: 5.0 release with minor bugfixes¶

Released on 2024-04-21 - GitHub - PyPI

Kraken 5.x is a major release introducing trainable reading order, a cleaner API, and changes resulting in a ~50% performance improvement of recognition inference, in addition to a large number of smaller bug fixes and stability improvements.

What’s Changed¶

Trainable reading order based on an neural order relation operator adapted from this method (https://github.com/mittagessen/kraken/pull/492)
Updates to the ALTO/PageXML templates and the serializer which correct serialization of region and line taxonomies, use UUIDs, and reuse identifiers from input XML files in output.
Requirements are now mostly pinned to avoid pytorch/lightning accuracy and speed regressions that popped up semi-regularly with more free package versions.
Threadpool limits are now set in all CLI drivers to prevent slowdown from unreasonably large numbers of threads in libraries like OpenCV. As a result the --threads option of all commands has been split into --workers and –threads.
kraken.repo methods have been adapted to the new Zenodo API. They also correctly handle versioned records now.
A small fix enabling recognition inference with AMP.
Support for --fixed-splits in ketos test (@PonteIneptique)
Performance increase for polygon extraction by @Evarin in https://github.com/mittagessen/kraken/pull/555
Speed up legacy polygon extraction by @anutkk in https://github.com/mittagessen/kraken/pull/586
New container classes in kraken.containers replace the previous dicts produced and expected by segment/rpred/serialize.
kraken.serialize.serialize_segmentation() has been removed as part of the container class rework.
train/rotrain/segtrain/pretrain cosine annealing scheduling now allows setting the final learning rate with --cos-min-lr.
Lots of PEP8/whitespace/spelling mistake fixes from @stweil

New features¶

Reading order training¶

Reading order can now be learned with ketos rotrain and reading order models can be added to segmentation model files. The training process is documented here.

Upgrade guide¶

Command line¶

Polygon extractor¶

The polygon extractor is responsible for taking a page image, baselines, and their bounding polygons and dewarping + masking out the line. Here is an example:

kraken_faster

The new polygon extractor reduces line extraction time 30x, roughly halving inference time and significantly speeding up training from XML files and compilation of datasets. It should be noted that polygon extraction does not concern data in the legacy bounding box format nor does it touch the segmentation process as it is only a preprocessing step in the recognizer on an already existing segmentation.

Not all improvements in the polygon extractor are backward compatible, causing models trained with data extracted with the old implementation to suffer from a slight reduction in accuracy (usually <0.25 percentage points). Therefore models now contain a flag in their metadata indicating which implementation has been used to train them. This flag can be overridden, e.g.:

$ kraken --no-legacy-polygons -i ... ... ocr ...

to enable all speedups for a slight increase in character error rate.

For training the new extractor is enabled per default, i.e. models trained with kraken 5.x will perform slightly worse on earlier kraken version but will still work. It is possible to force use of only backwards compatible speedups:

$ ketos compile --legacy-polygons ...
$ ketos train --legacy-polygons ....
$ ketos pretrain --legacy-polygons ...

Threads and Multiprocessing¶

The command line tools now handle multiprocessing and thread pools more completely and configurably. --workers has been split into --threads and --workers, the former option limiting the size of thread pools (as much as possible) for intra-op parallelization, the latter setting the number of worker processes, usually for the purpose of data loading in training and dataset compilation.

API changes¶

While 5.x preserves the general OCR functional blocks, the existing dictionary-based data structures have been replaced with container classes and the XML parser has been reworked.

Container classes¶

For straightforward processing little has changed. Most keys of the dictionaries have been converted into attributes of their respective classes.

The segmentation methods now return a Segmentation object containing Region and BaselineLine/BBoxLine objects:

>>> pageseg.segment(im)
{'text_direction': 'horizontal-lr',
 'boxes': [(x1, y1, x2, y2),...],
 'script_detection': False
}

>>> blla.segment(im)
{'text_direction': '$dir',
 'type': 'baseline',
 'lines': [{'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]}, ...
          {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}]
 'regions': [{'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'}, ...
             {'region': [[x0, ...]], 'type': 'text'}]
}

becomes:

>>> pageseg.segment(im)
Segmentation(type='bbox', 
             imagename=None,
             text_direction='horizontal-lr',
             script_detection=False,
             lines=[BBoxLine(id='f1d5b1e2-030c-41d5-b299-8a114eb0996e',
                             bbox=[34, 198, 279, 251],
                             text=None,
                             base_dir=None,
                             type='bbox',
                             imagename=None,
                             tags=None,
                             split=None,
                             regions=None,
                             text_direction='horizontal-lr'),
                    BBoxLine(...],
             line_orders=[])

>>> blla.segment(im)
Segmentation(type='baseline', 
             imagename=im,
             text_direction='horizontal-lr',
             script_detection=False,
             lines=[BaselineLine(id='50ab1a29-c3b6-4659-9713-ff246b21d2dc',
                                 baseline=[[183, 284], [272, 282]],
                                 boundary=[[183, 284], ... ,[183, 284]],
                                 text=None,
                                 base_dir=None,
                                 type='baselines',
                                 tags={'type': 'default'},
                                 split=None,
                                 regions=['e28ccb6b-2874-4be0-8e0d-38948f0fdf09']), ...],
             regions={'text': [Region(id='e28ccb6b-2874-4be0-8e0d-38948f0fdf09',
                                      boundary=[[123, 218], ..., [123, 218]],
                                      tags={'type': 'text'}), ...],
                               'foo': [Region(...), ...]},
             line_orders=[])

The recognizer now yields BaselineOCRRecords/BBoxOCRRecords which both inherit from the BaselineLine/BBoxLine classes:

>>> record = rpred(network=model,
                   im=im,
                   segmentation=baseline_seg)
>>> record = next(rpred.rpred(im))
>>> record
BaselineOCRRecord pred: 'predicted text' baseline: ...
>>> record.type
'baselines'
>>> record.line
BaselineLine(...)
>>> record.prediction
'predicted text'

One complication is the new serialization function which now accepts a Segmentation object instead of a list of ocr_records and ancillary metadata:

>>> records = list(x for x in rpred(...))
>>> serialize(records,
              image_name=im.filename,
              image_size=im.size,
              writing_mode='horizontal-tb',
              scripts=['Latn', 'Hebr'],
              regions=[{...}],
              template='alto',
              template_source='native',
              processing_steps=proc_steps)

becomes:

>>> import dataclasses
>>> baseline_seg
Segmentation(...)
>>> records = list(x for x in rpred(..., segmentation=baseline_seg))
>>> results = dataclasses.replace(baseline_seg, lines=records)
>>> serialize(results,
              image_size=im.size,
              writing_mode='horizontal-tb',
              scripts=['Latn', 'Hebr'],
              template='alto',
              template_source='native',
              processing_steps=proc_steps)

This requires the construction of a new Segmentation object that contains the records produced by the text predictor. The most straightforward way to create this new Segmentation is through the dataclasses.replace function as our container classes are immutable.

Lastly, serialize_segmentation has been removed. The serialize function now accepts Segmentation objects which do not contain text predictions:

>>> serialize_segmentation(segresult={'text_direction': '$dir',
                                      'type': 'baseline',
                                      'lines': [{'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]}, ...
                                          {'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}]
                                      'regions': [{'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'}, ...
                                                  {'region': [[x0, ...]], 'type': 'text'}]
                                     },
                           image_name=im.filename,
                           image_size=im.size,
                           template='alto',
                           template_source='native',
                           processing_steps=proc_steps)

is replaced by:

>>> baseline_seg
Segmentation(...)
>>> serialize(baseline_seg,
              image_size=im.size,
              writing_mode='horizontal-tb',
              scripts=['Latn', 'Hebr'],
              template='alto',
              template_source='native',
              processing_steps=proc_steps)

XML parsing¶

The kraken.lib.xml.parse_{xml,alto,page} methods have been replaced by a single kraken.lib.xml.XMLPage class.

>>> parse_xml('xyz.xml')
{'image': impath,
 'lines': [{'boundary': [[x0, y0], ...],
            'baseline': [[x0, y0], ...],
            'text': apdjfqpf',
            'tags': {'type': 'default', ...}},
           ...
           {...}],
 'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}

becomes

>>> XMLPage('xyz.xml')
XMLPage xyz.xml (format: alto, image: impath)

As the parser is now aware of reading order the XMLPage.lines attribute is an unordered dict of BaselineLine/BBoxLine container classes. As ALTO/PageXML files can generally contain multiple different reading orders the XMLPage.get_sorted_lines()/XMLPAge.get_sorted_regions() method on the object provides an ordered view of lines or regions. The default order line_implicit/region_implicit corresponds to the order produced by the previous parsers, i.e. the order formed by the sequence of elements in the XML tree.

XMLPage objects can be converted into a Segmentation container using the XMLPage.to_container() method:

>>> XMLPage('xyz.xml').to_container()
Segmentation(...)

Full Changelog: https://github.com/mittagessen/kraken/compare/4.3.13…5.2

4.3.10¶

Released on 2023-04-18 - GitHub - PyPI

This is mostly a bugfix release but also includes a couple of minor improvements and changes.

Changes¶

Deterministic mode is now set to ‘warn’ preventing crashes in deterministic recognition training (CTC loss does not have a deterministic implementation).
contrib/extract_lines.py work with binary datasets
‘Word’ error rate has been added as a validation metric in recognition training
The fine-tuning options (--resize) add/both have been renamed to union/new. (Thibault Clérice) #488
Tensorboard logging now also logs a couple of training images

4.3.5¶

Released on 2023-02-22 - GitHub - PyPI

This is just another hotfix release.

Changes¶

799ee78: Propagation of the –raise-on-error for raising non-blocking errors in blla segmentation (Thibault Clérice) #444
d81e898: adds pl_logger to default hyperparams dict (Benjamin Kiessling)

4.3.4¶

Released on 2023-02-20 - GitHub - PyPI

This is a hotfix release to 4.3.0 correcting a regression in the CLI, fixing pretrain validation losses, and the conda environment files.

Commits¶

ac5fab6: Invalid type in click option definition for loggers (Benjamin Kiessling)
0cb9e0e: fix validation loss computation in pretrain (Benjamin Kiessling)
7d5069b: Remove former development raise in segmentation (Thibault Clérice) #441
0e3d10f: Install coremltools from pip for conda environments (Benjamin Kiessling)

4.3.0¶

Released on 2023-02-13 - GitHub - PyPI

What’s Changed¶

Pretraining has been reimplemented to be more faithful to the original publication for more stable memory consumption and easier hyperparameter selection
Learning rate warmup and backbone freezing in recognition training with --warmup and --freeze-backbone (mostly to enable fine-tuning pretrained models)
Enable ketos compile to create precompiled datasets with lines without a corresponding transcription with the --keep-empty-lines switch (mostly for pretraining models).
--failed-sample-threshold in training modules, aborting training after a certain number of samples failed to load
tensorboard logging with --logger/--log-dir options
Change codec construction during training when training and validation dataset alphabets don’t match. Prior code points that only exist in the validation set would be copied to the model codec. Now the model codec only contains trained code points.
Replace ocr_record with new smart classes BaselineOCRRecord and BBoxOCRRecord. These keep track of reading/display order, compute bounding polygons from the whole line bounding polygon, and average confidences when slicing.
ALTO parsing now deals with any reasonable PointsType (see https://github.com/altoxml/schema/issues/49)
The fallback line orientation heuristic now takes into account the principal text orientation defined with --text-direction instead of assuming horizontal lines (--text-direction horizontal-lr/-rl).
Baseline segmentation now supports padding of input images with --pad.
CLI now allows serialization with custom jinja2 templates through the --template option.
Switch validation metrics computation to torchmetrics.
Various bugfixes, mostly to deal with shapely shenanigans.

Thanks¶

@sixtyfive, @anutkk, @stweil, @colibrisson, @PonteIneptique for their contributions to this release.

Full Changelog: https://github.com/mittagessen/kraken/compare/4.2.0…4.3.0

4.1.2¶

Released on 2022-06-07 - GitHub - PyPI

Commits¶

3e10158: set border value in erosion in seamcarve (Benjamin Kiessling)

3.0.6¶

Released on 2021-11-08 - GitHub - PyPI

This is mainly a bugfix release containing small improvements such as additional tests, typing, spelling corrections, additional contrib scripts, and fixes for rarely used functionality.

Bugfixes¶

Orthography and missing help messages in the CLI drivers
Documentation for batch input specifications
Fix a regression in early stopping when training on GPU
Fix a regression in polygonization in the presence of regions
Do not duplicate regions during serialization
Add dummy String beneath TextLine w/o text in ALTO to avoid standard-violating empty TextLines
The codec loading functionality of ketos train and KrakenTrainer actually loads a given codec now.
Fall back to simple scaling when centerline dewarping fails
Drop (duplicate) short option form -p for –pad in all ketos commands

Features¶

The forced alignment script contrib/forced_alignment_overlay.py now preserves the input file and only replaces the character cuts.
Add reading order tests
Explicit model sanity checks in blla.segment()
Add baseline offset options to repolygonization script
Make codec self-synchronizing
Add TextEquiv for Word and TextLine in PAGE XML output
Raise PIL image size limit to 20k*20k image dimensions