Python API

For a gentle introduction to the python API, please refer to the Introduction to the Python API in the user guide.

The API is built around a few core concepts, primarily the data containers that are used to pass information between different processing steps. These containers are defined in the kraken.containers module.

The three primary containers are:

  • Segmentation: Represents the segmentation of a page, including baselines, bounding boxes, and regions.

  • BaselineLine: Represents the positional and typology information of a line in a segmentation.

  • BaselineOCRRecord: Represents a line of text that has been recognized, including the transcription and confidence scores.

API Reference