.. _getting_started: Getting Started =============== This guide provides a brief overview of how to install and use kraken. Installation ------------ Kraken can be run on Linux or macOS (both x64 and ARM). Installation is through the on-board *pip* utility. To not pollute the global state of your distribution's package manager it is recommended to use virtual environments. If you do not have a setup or do not wish to handle virtual environments yourself you can use `pipx`. .. code-block:: console $ sudo apt install pipx $ pipx install kraken kraken works both on Linux and Mac OS X and with any python interpreter between 3.10 and 3.13. It is possible the installation fails because `pipx` defaults to an unsupported interpreter version. In that case you need to install a compatible interpreter version such as 3.13 and then specify this version explicitly: .. code-block:: console $ sudo apt install python3.13-full $ pipx install --python python3.13 kraken Installation using pip ~~~~~~~~~~~~~~~~~~~~~~ Create and activate a separate virtual environment using whatever tool you like. .. code-block:: console $ pip install kraken or by running pip in the git repository: .. code-block:: console $ pip install . If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to install the `pdf` extras package for PyPi: .. code-block:: console $ pip install kraken[pdf] or .. code-block:: console $ pip install .[pdf] respectively. Development branch installation using pip ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To install the latest development branch through clone the kraken git repository and perform an editable install: .. code-block:: console $ git clone https://github.com/mittagessen/kraken.git $ cd kraken $ pip install --editable . Model Retrieval --------------- After installation, you'll need a model to process your documents. In kraken, models are pre-trained files that contain the knowledge for a specific task, such as identifying the layout of a page or recognizing characters in a particular script. Kraken provides a public repository of freely available models that can be accessed from the command line. To list all available models, run: .. code-block:: console $ kraken list To download a model, use the `get` command with the model's DOI. For example, to download the default model for printed French text, run: .. code-block:: console $ kraken get 10.5281/zenodo.10592716 For more information on how to interact with the model repository, please refer to the :doc:`user_guide/models` section of the user guide. The ATR Workflow ---------------- Automatic text recognition is a multi-step process that transforms an image of a document into a text file. In kraken, this process is broken down into a sequence of chainable commands, each performing a specific task. The three main steps in a typical ATR workflow are: 1. **Layout Analysis (Segmentation):** This step identifies the regions and lines of text on the page. In kraken, this is done with the `segment` command. 2. **Text Recognition (ATR):** This step transcribes the text from the line images identified in the previous step. In kraken, this is done with the `ocr` command. 3. **Serialization:** This step saves the output of the previous steps in a structured format, such as plain text, ALTO, or PageXML. This is handled by the output options of the `kraken` command. Models are essential to this workflow, as they provide the specific knowledge for layout analysis and text recognition. They are integrated into the kraken workflow as parameters for the `segment` and `ocr` commands. The choice of model is crucial for achieving good results, as a model trained on a specific type of material will perform best on similar material. Here is a quick example of a complete workflow: .. raw:: html :file: _static/kraken_workflow.svg Recognizing text on an image using the default parameters, including page segmentation: .. code-block:: console $ kraken -i image.tif image.txt segment -bl ocr -m catmus-print-fondue-large.mlmodel In this example, `segment` performs the layout analysis, and `ocr` performs the text recognition using the `catmus-print-fondue-large.mlmodel`. The final transcription is saved to `image.txt`.