.. _api_introduction:

Introduction to the Python API
==============================

kraken provides a powerful python API for programmatic access to all its
functionality. This guide provides a basic introduction to the most important
parts of the API.

High-Level API
--------------

The easiest way to use kraken programmatically is through the high-level API
in the :py:mod:`kraken.tasks` module. This API provides a set of task-oriented
classes for segmentation, recognition, and forced alignment.

Segmentation
~~~~~~~~~~~~

To segment an image, you can use the :py:class:`~kraken.tasks.SegmentationTaskModel` class.
It returns a :py:class:`~kraken.containers.Segmentation` object containing the
segmentation results.

The lines within the `Segmentation` object can be of two types, depending on
the model used:

* :py:class:`~kraken.containers.BaselineLine` for models that output baselines and polygons.
* :py:class:`~kraken.containers.BBoxLine` for models that output bounding boxes.

.. code-block:: python

    from PIL import Image
    from kraken.tasks import SegmentationTaskModel
    from kraken.configs import SegmentationInferenceConfig

    # Load the default segmentation model
    model = SegmentationTaskModel.load_model()

    im = Image.open('image.png')

    config = SegmentationInferenceConfig()

    segmentation = model.predict(im, config)

    for line in segmentation.lines:
        print(line.baseline)

Recognition
~~~~~~~~~~~

To recognize the text in an image, you can use the
:py:class:`~kraken.tasks.RecognitionTaskModel` class. This class takes a
:py:class:`~kraken.containers.Segmentation` object, a PIL image, and a
configuration as inputs and returns an iterator of `ocr_record` objects.

Similar to segmentation, the returned records can be of two types:

* :py:class:`~kraken.containers.BaselineOCRRecord` gor baseline-based recognition.
* :py:class:`~kraken.containers.BBoxOCRRecord` for bounding box-based recognition.

.. code-block:: python

    from PIL import Image
    from kraken.tasks import RecognitionTaskModel
    from kraken.configs import RecognitionInferenceConfig

    # Load a recognition model
    model = RecognitionTaskModel.load_model('model.safetensors')

    im = Image.ope('image.png')

    config = RecognitionInferenceConfig()

    # segmentation is a Segmentation object created by loading an XML file o
    # running segmentation manually.
    for record in model.predict(im, segmentation, config):
        print(record.prediction)

Forced Alignment
~~~~~~~~~~~~~~~~

Forced alignment is the process of aligning a given transcription to the output of a text recognition model, producing approximate character locations. This is a specialized operation outside a normal ATR workflow and can be used, e.g., to produce word bounding boxes for a known good transcription.

You can use the :py:class:`~kraken.tasks.ForcedAlignmentTaskModel` class to perform forced alignment:

.. code-block:: python

    from PIL import Image
    from kraken.tasks import ForcedAlignmentTaskModel
    from kraken.containers import Segmentation, BaselineLine
    from kraken.configs import RecognitionInferenceConfig

    # `model.safetensor` is a recognition model
    model = ForcedAlignmentTaskModel.load_model('model.safetensor')
    im = Image.open('image.png')
    line = BaselineLine(baseline=[(0,0), (100,0)], boundary=[(0,-10), (100,-10), (100,10), (0,10)], text='Hello World')
    segmentation = Segmentation(lines=[line])
    config = RecognitionInferenceConfig()

    aligned_segmentation = model.predict(im, segmentation, config)
    record = aligned_segmentation.lines[0]
    print(record.prediction)
    print(record.cuts)

Parsing XML
~~~~~~~~~~~

kraken can parse ALTO and PageXML files into :py:class:`~kraken.containers.Segmentation` objects. This is useful for loading ground truth data or the results of other OCR engines. The :py:class:`~kraken.lib.xml.XMLPage` class handles this.

.. note::
   The parser has been refactored in kraken 7.0 with changes to reading order parsing and robustness improvements. In particular, if the XML dimension field is invalid, kraken falls back to reading the source image to determine dimensions.

.. code-block:: python

    from kraken.lib.xml import XMLPage

    xml_page = XMLPage('input.xml')
    segmentation = xml_page.to_container()

Serialization
~~~~~~~~~~~~~

After segmentation and recognition, you can serialize the results into various formats, such as ALTO or PageXML, with the :py:func:`kraken.serialization.serialize` function.

.. code-block:: python

    from kraken.serialization import serialize

    # Assume `segmentation` is a Segmentation object from a previous step
    # and `im` is the PIL image object.
    
    # Serialize to ALTO
    alto_xml = serialize(segmentation, image_size=im.size, template='alto')

    with open('output.alto.xml', 'w') as f:
        f.write(alto_xml)

    # Serialize to PageXML
    page_xml = serialize(segmentation, image_size=im.size, template='page')

    with open('output.page.xml', 'w') as f:
        f.write(page_xml)

Plugin System
-------------

kraken features a plugin system that allows developers to extend its functionality with new commands, model types, and tasks. This system is based on python's entry points mechanism and primarily targets pytorch-based implementations.

To create a plugin, you need to:

1.  Create a new python package that depends on `kraken`.
2.  In your package, create a class that implements the required interface.
3.  Register your class as an entry point in your package's `pyproject.toml` or `setup.cfg`.

Entry Point Groups
~~~~~~~~~~~~~~~~~~

kraken provides several entry point groups for different types of plugins:

*   ``kraken.cli``: Adds new subcommands to the `kraken` command-line
    interface.
*   ``ketos.cli``: Adds new subcommands to the `ketos` command-line
    interface.
*   ``kraken.models``: Registers new model architectures.
*   ``kraken.lightning_modules``: Registers new PyTorch Lightning modules for training and model conversion.
*   ``kraken.loaders``: Registers new model loaders.
*   ``kraken.writers``: Registers new model writers.
*   ``kraken.tasks``: Registers new high-level tasks.

Model Plugins
~~~~~~~~~~~~~

The most common use case for plugins is to add new machine learning architectures for an already existing task type, such as defining a new segmentation method. This typically involves:

1.  Implementing a class that inherits from the requisite base model interface in :py:mod:`kraken.models.base`, such as :py:class:`~kraken.models.base.RecognitionBaseModel` for text recognition or :py:class:`~kraken.models.base.SegmentationBaseModel` for layout analysis.
2.  Registering this class in your plugin's `pyproject.toml` or `setup.cfg` under the `kraken.models` entry point.
3.  Implement a checkpoint container that provides a `load_from_checkpoint` method and is registered under the `kraken.lightning_modules` entrypoint. The easiest way to ensure correct behavior is to implement this class as a `lightning`_ LightningModule.
4.  Optionally, adding a training command to `ketos` by creating a `click`
    command and registering it under the `ketos.cli` entry point.

For a complete example of a layout analysis model plugin, refer to the
`dfine_kraken`_ project, which implements a D-FINE based segmentation method.

.. _`dfine_kraken`: https://github.com/mittagessen/dfine_kraken
.. _`lightning`: https://lightning.ai/docs/pytorch/stable/

Low-Level API
-------------

For more fine-grained control, you can use the low-level API in the :py:mod:`kraken.lib` module. This API provides direct access to the core components of kraken, such as the neural network models and the CTC decoders.

For more information, please refer to the :ref:`api_reference`.