kraken ====== .. toctree:: :hidden: :maxdepth: 2 introduction_to_atr getting_started user_guide/index api/index changelog kraken is an open-source, turn-key automatic text recognition (ATR) system optimized for historical and non-Latin script writing. Designed as a universal text recognizer for the humanities, it directly addresses the unique challenges of digitizing historical documents. Highly adaptable and trainable, kraken excels in scenarios often overlooked by general-purpose ATR engines. It is particularly well-suited for the long tail of digitization, supporting low-resource languages and diverse, non-conventional scripts common in humanities research. kraken offers two primary interfaces for its functionality: * A flexible and customizable **command-line interface**, intended for most users, enabling chainable workflows implementing the recognition pipeline. * A comprehensive **API**, designed for developers building custom workflows, integrating kraken into other projects, or requiring fine-grained control over the ATR process. This documentation is structured to guide you through kraken's capabilities: * The :doc:`introduction_to_atr` is a basic primer on the core concepts intended for readers without prior ATR or machine learning experience. * The :doc:`getting_started` guide provides a concise introduction to installation and basic usage. * The :doc:`user_guide/index` offers detailed information on utilizing the CLI tools for various tasks and an introduction to the python API. * The :doc:`api/index` serves as a comprehensive reference for integrating kraken into your python projects. Features ======== kraken's main features are: - Fully trainable layout analysis, reading order, and character recognition. - `Right-to-Left `_, `BiDi `_, and Top-to-Bottom script support - `ALTO `_, PageXML, abbyyXML, and hOCR output - Word bounding boxes and character cuts - Public repository of model files via `HTRMoPo `_ - Configurable recognition through a network specification language and a plugin system Integrations ============ Through its API, kraken has also been integrated into various frontend applications and larger processing suites that offer graphical user interfaces or scaffolding for large-scale digitization: * `eScriptorium `_: A web-based platform for annotating, transcribing, and training ATR models, tightly integrating kraken. * `OCR4all `_: A comprehensive ATR framework designed for historical documents, offering a user-friendly interface for various ATR tasks. * `OCR-D suite `_: A collection of tools for ATR-related tasks, aiming to build a full-stack ATR workflow for historical prints. * `arkindex by Teklia `_: A platform for large-scale document analysis and indexing with kraken support through a plugin. Community & Support =================== kraken is an open-source project driven by community contributions. We warmly welcome feedback, pull requests, bug reports, and feature suggestions on our `github repository `_. If you are looking for help, want to discuss features, or need support regarding integrations (particularly with **eScriptorium**), please join our community chat: `eScriptorium Gitter Channel `_ License ======= ``kraken`` is provided under the terms and conditions of the `Apache 2.0 License `_. Funding ======= kraken is developed at `Inria `_ and the `École Pratique des Hautes Études `_, `Université PSL `_. .. container:: twocol .. container:: leftside .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg :width: 100 :alt: Co-financed by the European Union .. container:: rightside This project was funded in part by the European Union (ATRIUM, project number 101132163). This project was funded in part by the European Union (ERC, MiDRASH, project number 101071829). This project was partially funded through the RESILIENCE project, funded from the European Union’s Horizon 2020 Framework Programme for Research and Innovation. .. container:: twocol .. container:: leftside .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png :width: 300 :alt: Received funding from the Programme d’investissements d’Avenir .. container:: rightside Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la Recherche au titre du Programme d’Investissements d’Avenir portant la référence ANR-21-ESRE-0005 (Biblissima+).