.. _inference: Inference with kraken ===================== .. important:: For migration details across all training and inference commands, see :doc:`migration_6_0`. The ``kraken`` command-line interface (CLI) is the primary entry point for all inference tasks. It employs a **chainable subcommand architecture**, allowing you to define a complete processing pipeline (e.g., binarization -> segmentation -> recognition) in a single invocation. Synopsis -------- .. code-block:: bash $ kraken [global-options] subcommand [subcommand-options] ... A typical pipeline establishes a global I/O context and passes the data through segmentation and recognition stages. .. code-block:: bash # Complete pipeline: Input Image -> Segmentation -> Recognition -> Output Text $ kraken -i input.jpg output.txt segment -bl ocr -m model.safetensors .. note:: **Order Matters:** Arguments placed *before* a subcommand (like ``segment`` or ``ocr``) are global options (e.g., input files, hardware selection). Arguments placed *after* a subcommand apply only to that specific step. Models ------ Inference requires trained models for both segmentation and recognition. * Stock models: kraken includes a default model for baseline segmentation (enabled via ``segment -bl``). * Custom models: For text recognition (and specialized segmentation), you must provide a trained model file. To learn how to search for and download models from the official repository, please refer to the :doc:`models` documentation. .. tip:: You can specify models using absolute paths, relative paths, or just the filename if the model is installed in the default global directory (``~/.local/share/kraken``). Input and Output ---------------- kraken handles input files through either explicit pairings (processing specific files) or batch globbing (processing folders). Global I/O Options ~~~~~~~~~~~~~~~~~~ These options must be specified before any subcommands. ``-i, --input `` Defines an explicit input/output pair. This option can be repeated multiple times in a single command. .. code-block:: bash $ kraken -i page1.tif page1.xml -i page2.tif page2.xml segment -bl ``-I, --batch-input `` Accepts a glob expansion for batch processing. This **requires** the ``-o`` option to define how output filenames are generated. ``-o, --suffix `` Used with ``-I``. Defines the suffix appended to the input filename to create the output filename. .. code-block:: bash # Processes all pngs, saving results as 'filename.png.txt' $ kraken -I "data/*.png" -o .txt ocr -m model.mlmodel ``-f, --format-type `` Forces a specific input handler. * ``image``: Standard raster images (default). * ``pdf``: Extracts images from PDF files. Output filenames are generated using a format string provided with the ``-p, --pdf-format`` option. * ``xml``, ``alto``, ``page``: Parses existing segmentation from XML files. Useful for modular workflows (e.g., running recognition on already segmented ALTO files). Output Serialization ~~~~~~~~~~~~~~~~~~~~ The format of the final output is controlled by global flags. * ``-a``: **ALTO XML** (the preferred standard) * ``-x``: **PageXML** * ``-h``: **hOCR** * ``-n``: **Native** (JSON for segmentation, plain text for recognition) * ``-t ``: custom **Jinja2** template The Processing Pipeline ----------------------- 1. Binarization (``binarize``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Converts input images to 1-bit monochrome. .. tip:: Modern recognition models usually accept grayscale or color inputs directly. This step is typically optional unless you are using the legacy bounding box segmenter or a model specifically trained on binary data. * ``--threshold ``: Sets the binarization threshold. Example ^^^^^^^ .. code-block:: bash # Binarize a page and write a 1bpp output image $ kraken -i page.tif page_bin.png binarize --threshold 0.5 2. Segmentation (``segment``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Analyzes the layout of the page to extract text lines. * ``-bl, --baseline``: Uses the neural baseline segmenter (**default/recommended**). It handles complex layouts and curved lines effectively. * ``-i, --model``: Specify a custom segmentation model. If omitted, the internal stock model is used. * ``-d, --text-direction``: Hints for the reading order heuristics. Options: ``horizontal-lr``, ``horizontal-rl``, ``vertical-lr``, ``vertical-rl``. * ``-x, --boxes``: Uses the legacy bounding box segmenter. **requires binary input.** Examples ^^^^^^^^ .. code-block:: bash # Neural baseline segmentation to ALTO XML output $ kraken -a -i page.tif page.seg.json -n segment -bl .. code-block:: bash # Legacy bbox segmentation (requires binary input) $ kraken -i page_bin.png page_bbox.json -n segment -x 3. Recognition (``ocr``) ~~~~~~~~~~~~~~~~~~~~~~~~ Transcribes text from the segmented lines. * ``-m, --model ``: Path to the recognition model. Supports both CoreML (``.mlmodel``) and safetensors (``.safetensors``) formats. * ``--no-segmentation``: Skips segmentation and treats the input image(s) as single text lines. Useful for processing folders of pre-cropped line images. * ``--reorder / --no-reorder``: Applies the Unicode BiDi algorithm to the output (enabled by default). * ``--base-dir ``: Forces a specific initial text direction when running the BiDi algorithm (e.g., ``L``, ``R``). * ``--temperature ``: Adjusts the softmax temperature during decoding. * *Values < 1.0*: Sharpen the probability distribution. * *Values > 1.0*: Smoothes out the distribution. * ``--no-legacy-polygons``: Disables the legacy fast-path polygon extractor. * ``-B, --batch-size ``: Number of lines processed per recognition forward pass. * ``--num-line-workers ``: Number of CPU workers for parallel line extraction/pre-processing. .. note:: The older tag-based multi-model recognition workflow is deprecated and scheduled for removal in kraken 8. Examples ^^^^^^^^ .. code-block:: bash # Complete pipeline: segment + recognize into ALTO $ kraken -i page.tif page.xml -a segment -bl ocr -m model.safetensors .. code-block:: bash # Recognize pre-segmented XML files $ kraken -f xml -I "data/*.xml" -o _ocr.xml ocr -m specialized.safetensors .. code-block:: bash # Batched + parallelized recognition (GPU) $ kraken -i page.tif page.txt --device cuda:0 --precision bf16-mixed \ segment -bl ocr -m model.safetensors -B 64 --num-line-workers 8 Performance Tuning ------------------ You can tune hardware utilization and floating-point precision via global options. ``--device `` Selects the compute device. * ``cpu``: Default. * ``cuda:N``: NVIDIA GPU (e.g., ``cuda:0``). * ``mps``: Apple Silicon (Metal Performance Shaders). ``--precision `` Sets the floating-point precision for inference. * ``32``: Standard FP32. * ``16``, ``16-mixed``: FP16 (half-precision). * ``bf16``, ``bf16-mixed``: BFloat16 (Recommended for NVIDIA Ampere+ GPUs to prevent numerical instability). On the `ocr` subcommand only: ``-B, --batch-size `` Number of lines processed in parallel on the GPU. Higher values increase throughput but require more VRAM. ``--num-line-workers `` Number of CPU processes used for pre-processing image batches before they are sent to the GPU. ``0`` runs extraction in the main process.