Changelog¶
7.0.0b3: 7.0 3rd beta release¶
Released on 2026-02-16 - GitHub - PyPI
What's Changed
- Rotate images according to EXIF metadata
- Port build system to pyproject.toml/hatchling
- Enables the model loading version compatibility check
- Enables segmentation with region-only models
- Fixes an incorrect metric tracking direction in reading order training.
7.0.0b2: 7.0 beta release¶
Released on 2026-02-15 - GitHub - PyPI
kraken 7.0 introduces major changes to training, inference, model handling, and extensibility.
If you are upgrading from 6.0.x as an average user, start with Breaking Changes and Command Line Behavior.
Installing the Beta
Install the latest available 7.0 pre-release from PyPI:
$ pip install --upgrade --pre krakenInstall this specific beta explicitly:
$ pip install --upgrade "kraken==7.0b1"Breaking Changes
- Python 3.9 support was dropped. kraken now supports Python 3.10 through 3.13.
- Device and precision options are now global on both
krakenandketoscommands. - Training and evaluation manifest option names changed from
--training-files/--evaluation-filesto--training-data/--evaluation-data. ketos train,ketos segtrain,ketos rotrain, andketos pretrainnow produce checkpoints and convert the best checkpoint to a weights file after training.- Segmentation training class filtering/merging CLI options were removed. Class mapping is now defined in YAML experiment files.
ketos segtestmetrics are now computed against a configurable class mapping, and baseline detection metrics replace the older, less informative pixel accuracy/IoU-only view.ketos compilefixed splits were removed due to a significant performance penalty. Use separate dataset files per split instead.- The API for both training and inference has been reworked extensively.
safetensorsis now the default output format for trained weights.- Neural reading order models are only executed when using the new task API.
- Recognition and segmentation inference accelerators now default to
auto, selecting the highest-performance available device.
In practice: most existing workflows keep working after small updates, but training artifacts and API entry points changed enough that scripted pipelines and API-use will need adaptation.
Bug Fixes
- Fixed a breaking bug in reading order models that prevented trained model weights from loading.
Features and Improvements
- A plugin system now allows easy extension of kraken functionality with new segmentation, recognition, and reading order implementations.
- Persistent configuration through experiment YAML files has been added to
ketos. - The new recognition API supports batching plus parallelized line extraction/processing, enabling effective GPU inference. Speedups of around 80% were observed on CPU, with even larger gains with GPU acceleration.
- Character cuts on
BaselineOCRRecordare now computed at initialization using a more efficient algorithm. This substantially reduces serialization overhead in the default--subline-segmentationmode. - Baseline detection metrics inspired by the Transkribus Evaluation Scheme are now computed during segmentation training. Unlike older pixel-based metrics, these scores correlate more directly with actual line detection quality.
- The XML parser has been reworked for better robustness against invalid input. When PageXML files contain invalid image dimensions, kraken now attempts to read dimensions from the referenced image file. Reading-order parsing was also fully reimplemented to handle partial explicit orders and multi-level ordering more gracefully.
Plugins
kraken can now use external implementations of layout analysis, text recognition, and reading order determination through Python entry points.
Plugins are distributed as regular Python packages. After installation, kraken discovers them automatically through entry points. Plugin model files are then used exactly like native kraken model files: pass them to --model on the CLI or load them via task classes in Python.
Example workflow with a D-FINE layout analysis plugin model:
# install plugin package
$ pip install git+https://github.com/mittagessen/dfine_kraken.git
# run layout analysis with a plugin model file
$ kraken -i page.tif page.json segment --baseline --model dfine_layout.safetensorsThe same model can be loaded programmatically with SegmentationTaskModel.load_model('dfine_layout.safetensors').
Command Line Behavior
Inference
Device and precision are now global options on kraken.
Set them before subcommands:
# CPU inference in full precision
$ kraken -i page.tif page.txt --device cpu --precision 32-true \
segment -bl ocr -m model.safetensors
# GPU inference with mixed bfloat16 precision
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
segment -bl ocr -m model.safetensorsRecognition now exposes two throughput controls:
-B/--batch-size: number of extracted line images sent per recognition forward pass.--num-line-workers: number of CPU worker processes used to extract/preprocess line images. Use0to keep extraction in-process.
# conservative settings for small GPUs or CPU-only runs
$ kraken -i page.tif page.txt segment -bl ocr -m model.safetensors \
-B 8 --num-line-workers 2
# higher-throughput GPU settings
$ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \
segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8Training
Experiment Files
Managing non-trivial training configurations from CLI flags alone was difficult, especially when heavily modifying segmentation class taxonomies. To address this, ketos now supports YAML experiment files.
Pass an experiment file with --config before the command name:
$ ketos --config experiments.yml segtrainYAML keys correspond to the internal parameter names used by the CLI.
Minimal segmentation training experiment file:
precision: 32-true
device: auto
num_workers: 16
num_threads: 1
segtrain:
training_data:
- seg_train.lst
evaluation_data:
- seg_val.lst
format_type: xml
checkpoint_path: seg_checkpoints
weights_format: safetensors
line_class_mapping:
- ['*', 3]
- ['DefaultLine', 3]Single experiment file containing multiple commands:
precision: 32-true
device: auto
num_workers: 16
num_threads: 1
train:
training_data:
- rec_train.lst
evaluation_data:
- rec_val.lst
format_type: xml
checkpoint_path: rec_checkpoints
weights_format: safetensors
segtrain:
training_data:
- seg_train.lst
evaluation_data:
- seg_val.lst
format_type: xml
checkpoint_path: seg_checkpoints
weights_format: safetensorsConfigurations for multiple commands can be saved in the same experiment file.
Recommendation: move non-trivial setups (class mappings, optimizer/scheduler settings, hardware defaults) into YAML so runs are reproducible and easier to review.
Training Outputs, Checkpoints, and Weights
For ketos train, ketos segtrain, and ketos rotrain, training now produces Lightning checkpoints (.ckpt) as the primary artifact instead of writing CoreML weights directly during training.
Checkpoint files include full training state (model weights, optimizer state, scheduler state, epoch/step counters, and serialized training config), enabling exact continuation of interrupted runs.
There are now two distinct continuation modes:
--resumerestores and continues from the checkpoint's exact previous training state. The checkpoint state is authoritative, even if command-line flags or config files specify different values.--loadkeeps the previous fine-tune/start-new-run behavior. It loads weights only and starts a fresh run using current CLI/config hyperparameters.
Use --resume when you want to continue the same run.
Use --load when you want to start a new run from existing weights.
In addition to regular checkpoints, kraken now writes an emergency abort checkpoint by default (checkpoint_abort.ckpt) when a training run exits via exception (for example, a crash or a forceful abort). This gives you a recovery point even when a run terminates unexpectedly.
Because checkpoints contain much more than deployable model weights and may execute arbitrary python code on load, distribute converted weights files rather than raw checkpoints. Conversion strips training-only state and produces a distribution-safe weights artifact.
At the end of training, kraken automatically converts the best checkpoint into a weights file. You can also convert manually with ketos convert.
The default weights format is now safetensors. Compared to legacy coreml weights, safetensors supports serialization of arbitrary model types, while coreml is limited to core model methods implemented in kraken.
Use --weights-format coreml only when you explicitly need legacy compatibility.
Testing
Segmentation test output now includes metrics computed on vectorized baselines that correlate with segmentation quality, making model selection for line detection much easier. segtest behavior also changed with the checkpoint/weights distinction. In previous releases, test data often had to mirror post-merge/post-filter training mappings, which made evaluation cumbersome without rewriting source labels.
In short: you can now evaluate more datasets directly, with less taxonomy rewriting.
Example segtest invocation:
$ ketos --device cpu segtest -m best_0.9471.safetensors -e test_manifest.lst -f xmlExample output excerpt:
Category Class Name Pixel Accuracy IOU Object Count
aux _start_sep 1.000 1.000 N/A
aux _end_sep 1.000 1.000 N/A
regions Text_Region 0.992 0.964 184
regions Foot_Notes 0.973 0.887 36
Class Precision Recall F1
Overall 0.947 0.933 0.940
DefaultLine 0.959 0.946 0.952
Marginalia 0.891 0.874 0.882
Class mappings are now stored in two forms in checkpoints and new weights files:
- A full mapping with all transformations (merges/filtering) from training taxonomy to model outputs.
- A canonical one-to-one mapping between label indices and class strings.
By default, evaluation uses the full mapping. Canonical mapping is used when explicitly requested and as a fallback for pre-7.0 model files. Fully custom mappings can also be defined in an experiment file.
Class mapping modes in ketos segtest:
# Use full (many-to-one) training mapping from checkpoint metadata
$ ketos segtest -m model.ckpt -e test.lst --test-class-mapping-mode full
# Use canonical one-to-one model output mapping
$ ketos segtest -m model.safetensors -e test.lst --test-class-mapping-mode canonical
# Provide explicit mapping for the test set taxonomy
$ ketos --config segtest_custom.yml segtest -m model.safetensors -e test.lst \
--test-class-mapping-mode custom# segtest_custom.yml
segtest:
line_class_mapping:
- ['DefaultLine', 3]
- ['Running_Title', 3]
- ['Marginal_Note', 4]
region_class_mapping:
- ['Text_Region', 5]
- ['Foot_Notes', 6]For easier debugging, ketos segtest now prints an explicit mapping between test-set classes and model classes, including clear indicators for merges, missing labels, and conflicts.
Example class taxonomy diagnostics table:
Class Mapping Diagnostics (model=full, dataset=effective)
Category Class Name Model Idx Dataset Idx Observed Effective Status
baselines DefaultLine 3 3 812 812 ok
baselines Running_Title 3 3 57 57 ok
baselines Rubrication 4 - 14 0 ignored by dataset mapping
regions Text_Region 5 5 184 184 ok
regions Illustration - 7 22 22 missing in model mapping
API
Configuration Classes
In previous versions of kraken, training and inference hyperparameters were defined in dictionaries in the default_specs module. This was error-prone and resulted in verbose code in the command line drivers.
If you maintain Python training/inference scripts, migrate to typed config classes for better defaults, clearer parameter names, and safer checkpoint serialization.
Before (6.0.x) using default_specs dictionaries:
from kraken.lib.default_specs import RECOGNITION_HYPER_PARAMS
from kraken.lib.train import RecognitionModel
hyper_params = RECOGNITION_HYPER_PARAMS.copy()
hyper_params.update({'batch_size': 8, 'lrate': 1e-3})
model = RecognitionModel(hyper_params=hyper_params, training_data=['train.lst'])After (7.0) using typed configuration classes:
from kraken.configs import (RecognitionInferenceConfig,
VGSLRecognitionTrainingConfig,
VGSLRecognitionTrainingDataConfig)
infer_cfg = RecognitionInferenceConfig(batch_size=8,
num_line_workers=4,
precision='bf16-mixed')
train_cfg = VGSLRecognitionTrainingConfig(lrate=1e-3,
quit='early',
epochs=24)
data_cfg = VGSLRecognitionTrainingDataConfig(training_data=['train.lst'],
evaluation_data=['val.lst'],
format_type='xml')Task-based API for Inference
blla.segment(), align.forced_align(), and rpred.rpred()/rpred.mm_rpred() have been replaced by implementation-agnostic task classes that provide better performance and flexibility. The largest gains are in text recognition, where CPU inference improves by roughly 80% through parallelization. Batching additionally enables efficient GPU utilization.
If you call legacy APIs directly, plan a migration to kraken.tasks soon. Legacy interfaces remain available for now but are deprecated.
To migrate an existing segmentation workflow, replace:
from PIL import Image
from kraken.blla import segment
from kraken.lib.vgsl import TorchVGSLModel
model = TorchVGSLModel.load_model('/path/to/segmentation/model.coreml')
im = Image.open('sample.jpg')
seg = segment(im, model=model)with:
from PIL import Image
from kraken.tasks import SegmentationTaskModel
from kraken.configs import SegmentationInferenceConfig
segmenter = SegmentationTaskModel.load_model('/path/to/segmentation/models.safetensors')
im = Image.open('sample.jpg')
seg = segmenter.predict(im=im, config=SegmentationInferenceConfig())For recognition. Before:
from PIL import Image
from kraken.rpred import rpred
from kraken.lib.models import load_any
net = load_any('/path/to/recognition/model.mlmodel')
for record in rpred(net, im, segmentation=seg):
print(record)After:
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig
recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
for record in recognizer.predict(im=im, segmentation=seg, config=RecognitionInferenceConfig(batch_size=8, num_line_workers=4)):
print(record)Recognition now supports batching (batch_size in RecognitionInferenceConfig) and parallel line extraction (num_line_workers), making GPU acceleration practical.
CUDA example with explicit accelerator/device settings:
from PIL import Image
from kraken.tasks import RecognitionTaskModel
from kraken.configs import RecognitionInferenceConfig
recognizer = RecognitionTaskModel.load_model('/path/to/recognition/model.safetensors')
config = RecognitionInferenceConfig(accelerator='gpu',
device=[0],
precision='bf16-mixed',
batch_size=64,
num_line_workers=8)
for record in recognizer.predict(im=Image.open('page.tif'), segmentation=seg, config=config):
print(record.prediction)The new recognition API does not support tag-based multi-model recognition (rpred.mm_rpred()), which was dropped to simplify batched inference.
For forced alignment. Before:
from PIL import Image
from kraken.containers import Segmentation, BaselineLine
from kraken.align import forced_align
from kraken.lib.models import load_any
model = load_any('model.mlmodel')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(imagename='image.png', lines=[line])
aligned_segmentation = forced_align(segmentation, model)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)After:
from PIL import Image
from kraken.tasks import ForcedAlignmentTaskModel
from kraken.containers import Segmentation, BaselineLine
from kraken.configs import RecognitionInferenceConfig
# Assume `model.mlmodel` is a recognition model
model = ForcedAlignmentTaskModel.load_model('model.mlmodel')
im = Image.open('image.png')
# Create a dummy segmentation with a line and a transcription
line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
segmentation = Segmentation(lines=[line])
config = RecognitionInferenceConfig()
aligned_segmentation = model.predict(im, segmentation, config)
record = aligned_segmentation.lines[0]
print(record.prediction)
print(record.cuts)The old interfaces remain available but are deprecated and will be removed in kraken 8.
Training Refactor
The training module has been moved from kraken.lib.train to kraken.train (with reading order and pretraining modules in kraken.lib.ro/kraken.lib.pretrain). Training now uses explicit configuration objects and consistently uses LightningDataModule-derived classes.
If you run training programmatically, update imports and constructors and switch hyperparameter dicts to config objects.
Before (6.0.x) style instantiation:
from kraken.lib.train import RecognitionModel, SegmentationModel
from kraken.lib.pretrain.model import RecognitionPretrainModel
from kraken.lib.ro.model import RODataModule, ROModel
rec = RecognitionModel(hyper_params={'batch_size': 8},
training_data=['train.lst'],
evaluation_data=['val.lst'])
seg = SegmentationModel(hyper_params={'epochs': 50},
training_data=['seg_train.lst'],
evaluation_data=['seg_val.lst'])
pre = RecognitionPretrainModel(hyper_params={'mask_prob': 0.5})
ro_dm = RODataModule(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'])
ro = ROModel(feature_dim=128, class_mapping={'default': 1}, hyper_params={'epochs': 3000})After (7.0) style instantiation:
from kraken.train import (KrakenTrainer,
VGSLRecognitionDataModule, VGSLRecognitionModel,
BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.lib.pretrain import PretrainDataModule, RecognitionPretrainModel
from kraken.lib.ro import RODataModule, ROModel
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
BLLASegmentationTrainingConfig, BLLASegmentationTrainingDataConfig,
VGSLPreTrainingConfig, VGSLPreTrainingDataConfig,
ROTrainingConfig, ROTrainingDataConfig)
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(training_data=['train.lst'], evaluation_data=['val.lst'], format_type='xml'))
rec_model = VGSLRecognitionModel(VGSLRecognitionTrainingConfig(epochs=24, quit='early'))
seg_dm = BLLASegmentationDataModule(BLLASegmentationTrainingDataConfig(training_data=['seg_train.lst'], evaluation_data=['seg_val.lst'], format_type='xml'))
seg_model = BLLASegmentationModel(BLLASegmentationTrainingConfig(epochs=50, quit='fixed'))
pre_dm = PretrainDataModule(VGSLPreTrainingDataConfig(training_data=['pretrain_train.lst'], evaluation_data=['pretrain_val.lst'], format_type='path'))
pre_model = RecognitionPretrainModel(VGSLPreTrainingConfig(mask_prob=0.5))
ro_dm = RODataModule(ROTrainingDataConfig(training_data=['ro_train.lst'], evaluation_data=['ro_val.lst'], format_type='xml', level='baselines'))
ro_model = ROModel(ROTrainingConfig(epochs=3000, quit='early'))The KrakenTrainer module works as before.
In addition, separate test routines are now integrated into Lightning modules, allowing straightforward programmatic execution of the test loop for segmentation and recognition.
Example: programmatic test loop execution with KrakenTrainer.test():
KrakenTrainer.test() returns typed metric containers:
- Recognition (
RecognitionTestMetrics):character_counts,num_errors,cer,wer,case_insensitive_cer,confusions,scripts,insertions,deletes,substitutions - Segmentation (
SegmentationTestMetrics):class_pixel_accuracy,mean_accuracy,class_iu,mean_iu,freq_iu,region_iu,bl_precision,bl_recall,bl_f1,bl_detection_per_class
from kraken.train import (KrakenTrainer,
VGSLRecognitionDataModule, VGSLRecognitionModel,
BLLASegmentationDataModule, BLLASegmentationModel)
from kraken.configs import (VGSLRecognitionTrainingConfig, VGSLRecognitionTrainingDataConfig,
BLLASegmentationTrainingConfig, BLLASegmentationTestDataConfig)
trainer = KrakenTrainer(accelerator='cpu', devices=1, precision='32-true')
rec_model = VGSLRecognitionModel.load_from_weights('rec_best.safetensors',
VGSLRecognitionTrainingConfig())
rec_dm = VGSLRecognitionDataModule(VGSLRecognitionTrainingDataConfig(test_data=['rec_test.lst'], format_type='xml'))
rec_metrics = trainer.test(rec_model, rec_dm)
seg_model = BLLASegmentationModel.load_from_weights('seg_best.safetensors',
BLLASegmentationTrainingConfig())
seg_dm = BLLASegmentationDataModule(BLLASegmentationTestDataConfig(test_data=['seg_test.lst'],
format_type='xml',
test_class_mapping_mode='canonical'))
seg_metrics = trainer.test(seg_model, seg_dm)Plugin System and Model Base Classes
kraken now supports alternative segmentation and recognition implementations through a plugin system based on Python entry points. To be compatible, plugins must implement the interfaces defined by the abstract kraken.models.BaseModel class. kraken.models.SegmentationBaseModel and kraken.models.RecognitionBaseModel provide task-specific base interfaces.
This is primarily relevant if you are extending kraken with custom model types or distributing third-party integrations.
Rough implementation skeletons:
from torch import nn
from kraken.models import BaseModel, SegmentationBaseModel, RecognitionBaseModel
class MySegmentationModel(nn.Module, SegmentationBaseModel):
_kraken_min_version = '7.0.0'
model_type = ['segmentation']
def prepare_for_inference(self, config): self.eval()
def predict(self, im): ...
class MyRecognitionModel(nn.Module, RecognitionBaseModel):
_kraken_min_version = '7.0.0'
model_type = ['recognition']
def prepare_for_inference(self, config): self.eval()
def predict(self, im, segmentation): ...To be discoverable by kraken, these classes must be registered as entry points in your setup.cfg or similar under the kraken.models group with their class name:
Example from kraken's own setup.cfg:
[entry_points]
kraken.models =
TorchVGSLModel = kraken.lib.vgsl:TorchVGSLModel
Wav2Vec2Mask = kraken.lib.pretrain:Wav2Vec2Mask
ROMLP = kraken.lib.ro:ROMLPThere is an example plugin in D-FINE kraken incorporating the D-FINE object detector for layout analysis.
Model Handling
kraken replaced type-specific model loaders with a modular serialization/deserialization architecture. Models can also be loaded directly via task APIs. The default serialization format is now safetensors, which supports arbitrary model types. The new API in kraken.models can read (kraken.models.load_models) and write model collections (kraken.models.write_safetensors). Model files are designed to contain multiple models (for example, layout + reading order), so these routines accept and return lists of models. You can mix "native" kraken implementations and plugin implementations in the same model file, such as a BLLA line segmentation and D-FINE region segmentation model. CoreML support remains, but only for legacy models from kraken 6 and earlier.
For most users: prefer safetensors, treat checkpoints as training artifacts, and distribute converted weights files.
Before (6.0.x) model loading:
# recognition
from kraken.lib.models import load_any
rec_model = load_any('recognition_model.mlmodel')
# segmentation
from kraken.lib.vgsl import TorchVGSLModel
seg_model = TorchVGSLModel.load_model('segmentation_model.mlmodel')After (7.0) unified loading:
from kraken.models import load_models
from kraken.tasks import RecognitionTaskModel, SegmentationTaskModel
# load by task type
rec_models = load_models('model_bundle.safetensors', tasks=['recognition'])
seg_and_ro_models = load_models('model_bundle.safetensors', tasks=['segmentation', 'reading_order'])
# use via task API
recognizer = RecognitionTaskModel(rec_models)
segmenter = SegmentationTaskModel(seg_and_ro_models)The new model stack explicitly distinguishes checkpoints from weights files. After training, checkpoints should be converted to weights. The universal conversion routine kraken.models.convert_models relies on additional entry points: a checkpoint LightningModule (or compatible class exposing load_from_checkpoint) and any configuration classes serialized into model weights. During conversion, checkpoints are loaded in weights_only mode. To support safe deserialization, kraken adds all classes registered under kraken.configs to PyTorch safe globals.
Minimal plugin registration in setup.cfg for checkpoint conversion:
[entry_points]
kraken.lightning_modules =
MyVGSLLightningModule = mypkg.training:MyVGSLLightningModule
kraken.configs =
MyTrainingConfig = mypkg.configs:MyTrainingConfig
kraken.models =
MyModel = mypkg.models:MyModelCheckpoint/weights conversion examples:
# CLI
$ ketos convert -i checkpoint_09-0.9431.ckpt -o model_best.safetensorsfrom kraken.models import convert_models, load_models
from kraken.models.convert import load_from_checkpoint
# checkpoint to weights
convert_models(['checkpoint_09-0.9431.ckpt'], 'model_best.safetensors')
# load lightning module from checkpoint (weights_only mode)
module = load_from_checkpoint('checkpoint_09-0.9431.ckpt')
net = module.net
# load converted weights
models = load_models('model_best.safetensors')6.0.4: Hotfix release for blla.segment()¶
Released on 2026-02-13 - GitHub - PyPI
Corrects a regression where blla.segment() would not load a default model when none was explicitly defined on the CLI.
6.0.3¶
Released on 2025-12-13 - GitHub - PyPI
Bug Fixes
- Fixes a regression in tag-based recognition.
- Pin rich to below 14.1 and relax pytorch pin to 2.9.x.
- Remove
--deviceoption fromketos rotrainand use the value from the base command instead. - Fixes small typos in documentation (Stefan Weil) #741
6.0.2 hotfix release¶
Released on 2025-12-11 - GitHub - PyPI
Another hotfix release. blla.segment() would access incorrect fields of the new tags data structure.
6.0.1 hotfix release¶
Released on 2025-12-11 - GitHub - PyPI
This is a hotfix release pinning click to below 8.3 as flag option parsing is inconsistent in later releases.
6.0.0¶
Released on 2025-09-03 - GitHub - PyPI
The 6.0 release does not introduce any major new features but changes the behavior of multiple components and introduces non-backward-compatible API changes, necessitating a major release.
Backward-incompatible changes
Ketos subcommand options that were shared by many commands, namely --device, --workers, --precision, and --threads have been moved to the main command.
For ketos compile:
ketos compile --workers 16 .... # OLD
ketos --workers 16 compile ... # NEW
For ketos train/segtrain/rotrain/test/segtest/pretrain:
ketos train -d cuda:0 --workers 16 --threads 23 --precision bf16-true # OLD
ketos -d cuda:0 --workers 16 --threads 23 --precision bf16-true # NEW
Tag parsing has changed which not only affects the internal data structures of the container classes but also the user-facing command line interface. The mapping of line tags to recognition models in the kraken ocr's -m argument now always uses the resolved type of the line. The resolved type is determined for ALTO files by any tag reference pointing to a tag element either with a TYPE attribute with value type or no TYPE attribute at all. For PageXML files this is determined by the custom string structure {type: $value;}.
These changes are in preparation for the eventual removal of per-tag-recognition as it prevents optimizing recognition throughput with batching.
New features
The model repository has seen a major upgrade with a new metadata schema called HTRMoPo that allows uploading more model types (segmentation, recognition, reading order, ...) and includes support for informative huggingface-style model cards. The new implementation also caches the model repository state for faster querying, has support for versioned models, and allows filtering of output based on various metadata fields. Interaction with the repository using the command line drivers is documented here.
The API and command line driver for reading order model training (ketos rotrain) now supports the same filtering and merging options as the segmentation training tools which makes it easier to train RO models when the corresponding segmentation model has been trained using these options.
Testing recognition models with ketos test now also computes a case-insensitive character error rate. (Thanks Weslley Oliveira!).
Per-step and average epoch training loss is now printed on the progress bars of all training tools (ketos pretrain, ketos rotrain, ketos segtrain, ketos train).
The contrib/repolygonize.py now allows setting the scale of the polygonization input with the --scale option. (Thanks Weslley Oliveira!)
contrib/set_seg_options.py can set the segmentation model option for line location to centerline as well.
A new contrib/add_neural_ro.py script can be used to add a new reading order generated by a neural reading order model to an existing XML facsimile.
A softmax temperature option has been added to smooth out the confidence distribution of the character confidences of text recognition output. The option is available as an argument to TorchSeqRecognizer and the --temperature setting on the kraken ocr subcommand.
Removed features
The synthetic line generation tools were removed as they were only useful for training legacy line-strip recognition models. The recommended alternative that is compatible with baseline-style models is the new pangoline tool. A short description how to prepare kraken training data with it is available here in the docs.
Likewise, the legacy HTML file-based transcription environment was removed as it never supported transcription of baseline segmentation data. eScriptorium is the suggested replacement.
Installation through anaconda is gone. Due to coreml not being maintained in conda-forge it has not been possible to do a pure conda installation without side-loading packages through pip for a long while.
Misc. Changes
All valid floating point precision values known to pytorch lightning can now be used with the --precision option of ketos.
scripts.json has been updated to include the new scripts encoded by Unicode 16.
The reading order training code has been refactored.
Region filtering now supports types containing $.
contrib/extract_lines.py writes output always as RGB images.
The pytorch pin has been relaxed to accept versions between 2.4.0 and 2.7.x.
API changes
The XML parsing, container classes, and tagging have been revamped, introducing a number of changes.
Tags
Tags on the container classes (Region, BaselineLine, BboxLine) were previously a simple dictionary containing string keys and values which was less expressive than the Transkribus-style custom strings mapping an identifier to one or more dictionaries, e.g. language {id: eng; name: English} language {id: heb; name: Hebrew}. With the current release all tags are in dict-list-of-dicts format, taking the example above {'language': [{'id': 'eng', 'name': 'English'}, {'id': 'heb', 'name': 'Hebrew'}]}, no matter their source (PageXML or ALTO files). Tags parsed from ALTO's tag reference system, which only allows serialization of key-value paris, are expanded by introducing a dummy key 'type' in the value dicts, i.e.
<Tags>
<OtherTag> ID="foo" LABEL="heb" TYPE="language"/>
...
</Tags>
...
<TextLine ... TAGREFS="foo">...
will have a value of the tags property of the parsed line {'language': [{'type': 'heb'}]}. When multiple tags with the same TYPE are referenced, the value dicts will be aggregated into a list (PageXML custom string are treated analogously):
<Tags>
<OtherTag> ID="foo" LABEL="heb" TYPE="language"/>
<OtherTag> ID="foo" LABEL="eng" TYPE="language"/>
...
</Tags>
...
<TextLine ... TAGREFS="foo">...
will be parsed as {'language': [{'type': 'heb'}, {'type': 'eng']}. The TYPE field on ALTO files is not obligatory, if it is missing the TYPE will be treated as having the value type.
Baseline and Bbox XML parsing
The XMLPage class is now able to parse input facsimile files as either containing bounding-box or baselines by changing the value of the linetype argument:
> from kraken.lib.xml import XMLPage
> doc = XMLPage('alto.xml', linetype='baselines').to_container()
> print(doc.type)
baselines
> doc.lines[0]
BaselineLine(id='eSc_line_192895', baseline=[(848, 682), (934, 678), (1027, 689), (1214, 696), (2731, 700)], boundary=[(844, 678), (851, 635), (1038, 649), (1053, 635), (1110, 635), (1182, 664), (1311, 656), (1351, 635), (1365, 649), (1469, 635), (1505, 664), (1552, 646), (1570, 660), (1599, 635), (1685, 667), (1746, 653), (1786, 664), (1822, 639), (1947, 667), (2199, 667), (2289, 639), (2346, 667), (2386, 649), (2422, 667), (2497, 667), (2526, 642), (2619, 664), (2637, 649), (2670, 667), (2716, 656), (2727, 696), (2716, 761), (2673, 761), (2645, 735), (2555, 739), (2537, 753), (2508, 743), (2490, 761), (2458, 735), (2393, 757), (2364, 739), (2267, 761), (2163, 743), (2080, 761), (2005, 739), (1969, 761), (1929, 739), (1865, 757), (1807, 739), (1764, 761), (1732, 739), (1602, 761), (1530, 743), (1509, 753), (1484, 735), (1459, 757), (1405, 743), (1351, 757), (1304, 735), (1283, 757), (1232, 757), (1193, 732), (1168, 757), (1124, 757), (1067, 732), (1045, 746), (999, 732), (848, 732)], text="בשאול וגו' ˙ אם יחבאו בראש הכרמל וגו' אם ילכו בשבי וגו' אין חשך ואין [צל']", base_dir='L', type='baselines', imagename=None, tags=None, split=None, regions=['eSc_textblock_10523'], language=['iai'])
> doc = XMLPage('alto.xml', linetype='bbox').to_container()
> print(doc.type)
bbox
> doc.lines[0]
BBoxLine(id='eSc_line_192895', bbox=(844, 635, 2727, 761), text="בשאול וגו' ˙ אם יחבאו בראש הכרמל וגו' אם ילכו בשבי וגו' אין חשך ואין [צל']", base_dir='L', type='bbox', imagename=None, tags=None, split=None, regions=['eSc_textblock_10523'], text_direction='horizontal-lr', language=['iai'])
This simplifies using text recognition models trained on bounding box data with input data in XML format. Instead of manually creating the appropriate Segmentation object it is now possible to just run the parser with linetype set and hand the container to rpred.rpred().
When the source files are PageXML, the bounding boxes around lines are computed from the maximum extend of the line bounding polygon. For ALTO files the bounding boxes are taken from the HPOS, VPOS, HEIGHT, WIDTH attributes which means that no bounding polygons need to be defined in a Shape element.
Language parsing
In addition, it now parses language information in source files, Region/BBoxLine/BaselineLine classes have a new language property containing a list of language identifiers, and the standard output format templates serialize the field correctly. For PageXML files these identifiers are validated to the ISO639-3 standard, for ALTO files the values are gathered as is. Inheritance from the page and region level is handled correctly but the notion of primaryLanguage and secondaryLanguage attributes is lost during parsing as they are merged with any language identifiers in the custom string. For ALTO files language information is taken from the LANG attribute and any references to tags that have a type of language. The current uses of this system are limited but are in preparation for integration of the new party recognizer.
Hyperparameter register
lib/register.py is a new module that contains valid values for hyperparameters like optimizers, schedulers, precision, and stoppers.
Bugfixes
- 0053402: Correct return value for image load error in extract line & line path (rlskoeser) #665
- d356587: Add a test for image error handling (rlskoeser) #665
- bbf4336: Fix Augmentation Issues (Weslley Oliveira) #673
- b435c77: Bug fix for class determination in RO dataset
- 8a13475: Fix a situation where unicodedata.category is not covering up enough (Thibault Clérice) #692
- 9a218ce: Prefix uuids with
_to make them valid xml:ids
Among many others.
5.2.9 - Bugfix release¶
Released on 2024-08-27 - GitHub - PyPI
What's Changed
- Pins python-bidi to a version that supports our internal data structure mangling
- Fixes a small regression in pretraining
- Various PageXML serialization improvements
- ketos now prints a helpful message when trying to use a binary file with the
-t/-eoptions expecting manifest files - Fixes serialization of dummy boxes by @PonteIneptique in #612
- Update alto to not produce Polygon tag on default blocks by @PonteIneptique in #620
- corrected mask of patch by @saiprabhath2002 in #617
New Contributors
- @saiprabhath2002 made their first contribution in #617
Full Changelog: 5.2.5...5.2.9
5.2.5 Bugfix release¶
Released on 2024-05-23 - GitHub - PyPI
- Fixes XML serialization of segmentation results (#597)
- Removes regression in polygonization code introduced with performance enhancements (#605)
extract_polygons()now raises an exception when processing baselines < 5px in length (#606)- Various small improvements to
contrib/segmentation_overlay.py ketos compileprogress bar now displays elapsed/remaining time (#504)
5.2.4: Hotfix release¶
Released on 2024-05-09 - GitHub - PyPI
- Fixes a regression in container-based binary dataset building
- Fixes spurious updates of validation metrics after sanity checking
5.2.3: Hotfix for segmentation training¶
Released on 2024-05-09 - GitHub - PyPI
What's Changed
- Hotfix for segmentation training
5.2.2: Hotfix for no_segmentation mode recognition¶
Released on 2024-04-30 - GitHub - PyPI
Hotfix release fixing a regression in no_segmentation recognition.
5.2.1 hotfix release¶
Released on 2024-04-22 - GitHub - PyPI
This release contains two small fixes for a regression related to bumping lightning up to 2.2 and a crash in Segmentation instantiation occurring when the first region type does not contain a region/dict.
5.2: 5.0 release with minor bugfixes¶
Released on 2024-04-21 - GitHub - PyPI
Kraken 5.x is a major release introducing trainable reading order, a cleaner API, and changes resulting in a ~50% performance improvement of recognition inference, in addition to a large number of smaller bug fixes and stability improvements.
What's Changed
- Trainable reading order based on an neural order relation operator adapted from this method (#492)
- Updates to the ALTO/PageXML templates and the serializer which correct serialization of region and line taxonomies, use UUIDs, and reuse identifiers from input XML files in output.
- Requirements are now mostly pinned to avoid pytorch/lightning accuracy and speed regressions that popped up semi-regularly with more free package versions.
- Threadpool limits are now set in all CLI drivers to prevent slowdown from unreasonably large numbers of threads in libraries like OpenCV. As a result the
--threadsoption of all commands has been split into--workersand --threads. kraken.repomethods have been adapted to the new Zenodo API. They also correctly handle versioned records now.- A small fix enabling recognition inference with AMP.
- Support for
--fixed-splitsinketos test(@PonteIneptique) - Performance increase for polygon extraction by @Evarin in #555
- Speed up legacy polygon extraction by @anutkk in #586
- New container classes in
kraken.containersreplace the previous dicts produced and expected bysegment/rpred/serialize. kraken.serialize.serialize_segmentation()has been removed as part of the container class rework.train/rotrain/segtrain/pretraincosine annealing scheduling now allows setting the final learning rate with--cos-min-lr.- Lots of PEP8/whitespace/spelling mistake fixes from @stweil
New features
Reading order training
Reading order can now be learned with ketos rotrain and reading order models can be added to segmentation model files. The training process is documented here.
Upgrade guide
Command line
Polygon extractor
The polygon extractor is responsible for taking a page image, baselines, and their bounding polygons and dewarping + masking out the line. Here is an example:
The new polygon extractor reduces line extraction time 30x, roughly halving inference time and significantly speeding up training from XML files and compilation of datasets. It should be noted that polygon extraction does not concern data in the legacy bounding box format nor does it touch the segmentation process as it is only a preprocessing step in the recognizer on an already existing segmentation.
Not all improvements in the polygon extractor are backward compatible, causing models trained with data extracted with the old implementation to suffer from a slight reduction in accuracy (usually <0.25 percentage points). Therefore models now contain a flag in their metadata indicating which implementation has been used to train them. This flag can be overridden, e.g.:
$ kraken --no-legacy-polygons -i ... ... ocr ...
to enable all speedups for a slight increase in character error rate.
For training the new extractor is enabled per default, i.e. models trained with kraken 5.x will perform slightly worse on earlier kraken version but will still work. It is possible to force use of only backwards compatible speedups:
$ ketos compile --legacy-polygons ...
$ ketos train --legacy-polygons ....
$ ketos pretrain --legacy-polygons ...
Threads and Multiprocessing
The command line tools now handle multiprocessing and thread pools more completely and configurably. --workers has been split into --threads and --workers, the former option limiting the size of thread pools (as much as possible) for intra-op parallelization, the latter setting the number of worker processes, usually for the purpose of data loading in training and dataset compilation.
API changes
While 5.x preserves the general OCR functional blocks, the existing dictionary-based data structures have been replaced with container classes and the XML parser has been reworked.
Container classes
For straightforward processing little has changed. Most keys of the dictionaries have been converted into attributes of their respective classes.
The segmentation methods now return a Segmentation object containing Region and BaselineLine/BBoxLine objects:
>>> pageseg.segment(im)
{'text_direction': 'horizontal-lr',
'boxes': [(x1, y1, x2, y2),...],
'script_detection': False
}
>>> blla.segment(im)
{'text_direction': '$dir',
'type': 'baseline',
'lines': [{'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]}, ...
{'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}]
'regions': [{'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'}, ...
{'region': [[x0, ...]], 'type': 'text'}]
}
becomes:
>>> pageseg.segment(im)
Segmentation(type='bbox',
imagename=None,
text_direction='horizontal-lr',
script_detection=False,
lines=[BBoxLine(id='f1d5b1e2-030c-41d5-b299-8a114eb0996e',
bbox=[34, 198, 279, 251],
text=None,
base_dir=None,
type='bbox',
imagename=None,
tags=None,
split=None,
regions=None,
text_direction='horizontal-lr'),
BBoxLine(...],
line_orders=[])
>>> blla.segment(im)
Segmentation(type='baseline',
imagename=im,
text_direction='horizontal-lr',
script_detection=False,
lines=[BaselineLine(id='50ab1a29-c3b6-4659-9713-ff246b21d2dc',
baseline=[[183, 284], [272, 282]],
boundary=[[183, 284], ... ,[183, 284]],
text=None,
base_dir=None,
type='baselines',
tags={'type': 'default'},
split=None,
regions=['e28ccb6b-2874-4be0-8e0d-38948f0fdf09']), ...],
regions={'text': [Region(id='e28ccb6b-2874-4be0-8e0d-38948f0fdf09',
boundary=[[123, 218], ..., [123, 218]],
tags={'type': 'text'}), ...],
'foo': [Region(...), ...]},
line_orders=[])
The recognizer now yields
BaselineOCRRecords/BBoxOCRRecords
which both inherit from the BaselineLine/BBoxLine classes:
>>> record = rpred(network=model,
im=im,
segmentation=baseline_seg)
>>> record = next(rpred.rpred(im))
>>> record
BaselineOCRRecord pred: 'predicted text' baseline: ...
>>> record.type
'baselines'
>>> record.line
BaselineLine(...)
>>> record.prediction
'predicted text'
One complication is the new serialization function which now accepts a
Segmentation object instead of a list of ocr_records and ancillary metadata:
>>> records = list(x for x in rpred(...))
>>> serialize(records,
image_name=im.filename,
image_size=im.size,
writing_mode='horizontal-tb',
scripts=['Latn', 'Hebr'],
regions=[{...}],
template='alto',
template_source='native',
processing_steps=proc_steps)
becomes:
>>> import dataclasses
>>> baseline_seg
Segmentation(...)
>>> records = list(x for x in rpred(..., segmentation=baseline_seg))
>>> results = dataclasses.replace(baseline_seg, lines=records)
>>> serialize(results,
image_size=im.size,
writing_mode='horizontal-tb',
scripts=['Latn', 'Hebr'],
template='alto',
template_source='native',
processing_steps=proc_steps)
This requires the construction of a new Segmentation object that contains the
records produced by the text predictor. The most straightforward way to create
this new Segmentation is through the dataclasses.replace function as our
container classes are immutable.
Lastly, serialize_segmentation has been removed. The serialize function now
accepts Segmentation objects which do not contain text predictions:
>>> serialize_segmentation(segresult={'text_direction': '$dir',
'type': 'baseline',
'lines': [{'baseline': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'boundary': [[x0, y0, x1, y1], ... [x_m, y_m]]}, ...
{'baseline': [[x0, ...]], 'boundary': [[x0, ...]]}]
'regions': [{'region': [[x0, y0], [x1, y1], ..., [x_n, y_n]], 'type': 'image'}, ...
{'region': [[x0, ...]], 'type': 'text'}]
},
image_name=im.filename,
image_size=im.size,
template='alto',
template_source='native',
processing_steps=proc_steps)
is replaced by:
>>> baseline_seg
Segmentation(...)
>>> serialize(baseline_seg,
image_size=im.size,
writing_mode='horizontal-tb',
scripts=['Latn', 'Hebr'],
template='alto',
template_source='native',
processing_steps=proc_steps)
XML parsing
The kraken.lib.xml.parse_{xml,alto,page} methods have been replaced by a single kraken.lib.xml.XMLPage class.
>>> parse_xml('xyz.xml')
{'image': impath,
'lines': [{'boundary': [[x0, y0], ...],
'baseline': [[x0, y0], ...],
'text': apdjfqpf',
'tags': {'type': 'default', ...}},
...
{...}],
'regions': {'region_type_0': [[[x0, y0], ...], ...], ...}}
becomes
>>> XMLPage('xyz.xml')
XMLPage xyz.xml (format: alto, image: impath)
As the parser is now aware of reading order the XMLPage.lines attribute is an
unordered dict of BaselineLine/BBoxLine container classes. As ALTO/PageXML
files can generally contain multiple different reading orders the
XMLPage.get_sorted_lines()/XMLPAge.get_sorted_regions() method on the object
provides an ordered view of lines or regions. The default order
line_implicit/region_implicit corresponds to the order produced by the
previous parsers, i.e. the order formed by the sequence of elements in the XML
tree.
XMLPage objects can be converted into a Segmentation container using the
XMLPage.to_container() method:
>>> XMLPage('xyz.xml').to_container()
Segmentation(...)
Full Changelog: 4.3.13...5.2
4.3.10¶
Released on 2023-04-18 - GitHub - PyPI
This is mostly a bugfix release but also includes a couple of minor improvements and changes.
Changes
- Deterministic mode is now set to 'warn' preventing crashes in deterministic recognition training (CTC loss does not have a deterministic implementation).
contrib/extract_lines.pywork with binary datasets- 'Word' error rate has been added as a validation metric in recognition training
- The fine-tuning options (
--resize) add/both have been renamed to union/new. (Thibault Clérice) #488 - Tensorboard logging now also logs a couple of training images
4.3.5¶
Released on 2023-02-22 - GitHub - PyPI
This is just another hotfix release.
Changes
4.3.4¶
Released on 2023-02-20 - GitHub - PyPI
This is a hotfix release to 4.3.0 correcting a regression in the CLI, fixing pretrain validation losses, and the conda environment files.
Commits
- ac5fab6: Invalid type in click option definition for loggers (Benjamin Kiessling)
- 0cb9e0e: fix validation loss computation in pretrain (Benjamin Kiessling)
- 7d5069b: Remove former development raise in segmentation (Thibault Clérice) #441
- 0e3d10f: Install coremltools from pip for conda environments (Benjamin Kiessling)
4.3.0¶
Released on 2023-02-13 - GitHub - PyPI
What's Changed
- Pretraining has been reimplemented to be more faithful to the original publication for more stable memory consumption and easier hyperparameter selection
- Learning rate warmup and backbone freezing in recognition training with
--warmupand--freeze-backbone(mostly to enable fine-tuning pretrained models) - Enable
ketos compileto create precompiled datasets with lines without a corresponding transcription with the--keep-empty-linesswitch (mostly for pretraining models). --failed-sample-thresholdin training modules, aborting training after a certain number of samples failed to load- tensorboard logging with
--logger/--log-diroptions - Change codec construction during training when training and validation dataset alphabets don't match. Prior code points that only exist in the validation set would be copied to the model codec. Now the model codec only contains trained code points.
- Replace
ocr_recordwith new smart classesBaselineOCRRecordandBBoxOCRRecord. These keep track of reading/display order, compute bounding polygons from the whole line bounding polygon, and average confidences when slicing. - ALTO parsing now deals with any reasonable PointsType (see altoxml/schema#49)
- The fallback line orientation heuristic now takes into account the principal text orientation defined with
--text-directioninstead of assuming horizontal lines (--text-direction horizontal-lr/-rl). - Baseline segmentation now supports padding of input images with
--pad. - CLI now allows serialization with custom jinja2 templates through the
--templateoption. - Switch validation metrics computation to torchmetrics.
- Various bugfixes, mostly to deal with shapely shenanigans.
Thanks
- @sixtyfive, @anutkk, @stweil, @colibrisson, @PonteIneptique for their contributions to this release.
Full Changelog: 4.2.0...4.3.0
4.1.2¶
Released on 2022-06-07 - GitHub - PyPI
Commits
- 3e10158: set border value in erosion in seamcarve (Benjamin Kiessling)
3.0.6¶
Released on 2021-11-08 - GitHub - PyPI
This is mainly a bugfix release containing small improvements such as additional tests, typing, spelling corrections, additional contrib scripts, and fixes for rarely used functionality.
Bugfixes
- Orthography and missing help messages in the CLI drivers
- Documentation for batch input specifications
- Fix a regression in early stopping when training on GPU
- Fix a regression in polygonization in the presence of regions
- Do not duplicate regions during serialization
- Add dummy String beneath
TextLinew/o text in ALTO to avoid standard-violating emptyTextLines - The codec loading functionality of
ketos trainandKrakenTraineractually loads a given codec now. - Fall back to simple scaling when centerline dewarping fails
- Drop (duplicate) short option form -p for --pad in all ketos commands
Features
- The forced alignment script
contrib/forced_alignment_overlay.pynow preserves the input file and only replaces the character cuts. - Add reading order tests
- Explicit model sanity checks in blla.segment()
- Add baseline offset options to repolygonization script
- Make codec self-synchronizing
- Add
TextEquivforWordandTextLinein PAGE XML output - Raise PIL image size limit to 20k*20k image dimensions
