Getting Started¶
This guide provides a brief overview of how to install and use kraken.
Installation¶
Kraken can be run on Linux or macOS (both x64 and ARM). Installation is through the on-board pip utility. To not pollute the global state of your distribution’s package manager it is recommended to use virtual environments. If you do not have a setup or do not wish to handle virtual environments yourself you can use pipx.
$ sudo apt install pipx
$ pipx install kraken
kraken works both on Linux and Mac OS X and with any python interpreter between 3.10 and 3.13. It is possible the installation fails because pipx defaults to an unsupported interpreter version. In that case you need to install a compatible interpreter version such as 3.13 and then specify this version explicitly:
$ sudo apt install python3.13-full
$ pipx install --python python3.13 kraken
Installation using pip¶
Create and activate a separate virtual environment using whatever tool you like.
$ pip install kraken
or by running pip in the git repository:
$ pip install .
If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to install the pdf extras package for PyPi:
$ pip install kraken[pdf]
or
$ pip install .[pdf]
respectively.
Development branch installation using pip¶
To install the latest development branch through clone the kraken git repository and perform an editable install:
$ git clone https://github.com/mittagessen/kraken.git
$ cd kraken
$ pip install --editable .
Model Retrieval¶
After installation, you’ll need a model to process your documents. In kraken, models are pre-trained files that contain the knowledge for a specific task, such as identifying the layout of a page or recognizing characters in a particular script.
Kraken provides a public repository of freely available models that can be accessed from the command line. To list all available models, run:
$ kraken list
To download a model, use the get command with the model’s DOI. For example, to download the default model for printed French text, run:
$ kraken get 10.5281/zenodo.10592716
For more information on how to interact with the model repository, please refer to the Model Management section of the user guide.
The ATR Workflow¶
Automatic text recognition is a multi-step process that transforms an image of a document into a text file. In kraken, this process is broken down into a sequence of chainable commands, each performing a specific task.
The three main steps in a typical ATR workflow are:
Layout Analysis (Segmentation): This step identifies the regions and lines of text on the page. In kraken, this is done with the segment command.
Text Recognition (ATR): This step transcribes the text from the line images identified in the previous step. In kraken, this is done with the ocr command.
Serialization: This step saves the output of the previous steps in a structured format, such as plain text, ALTO, or PageXML. This is handled by the output options of the kraken command.
Models are essential to this workflow, as they provide the specific knowledge for layout analysis and text recognition. They are integrated into the kraken workflow as parameters for the segment and ocr commands. The choice of model is crucial for achieving good results, as a model trained on a specific type of material will perform best on similar material.
Here is a quick example of a complete workflow:
Recognizing text on an image using the default parameters, including page segmentation:
$ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel
In this example, segment performs the layout analysis, and ocr performs the text recognition using the catmus-print-fondue-large.mlmodel. The final transcription is saved to image.txt.