There are currently three kinds of models containing the recurrent neural
networks doing all the character recognition supported by kraken: traditional
pyrnn models that are just pickled instances of python objects, a new
format serializing data in
pronn files, and clstm’s native protocol buffer serialization.
These are serialized instances of python
lstm.SeqRecognizer objects. Using
such a model just entails loading the pickle and calling the appropriate
functions to perform recognition much like a shared library in other
Several drawbacks exist when using pickled models. First they inherently allow arbitrary code execution with relative ease. Additionally, they are not upward compatible between python 2.x and 3.x and significantly larger than the newer HDF5 models (roughly 6.5Mb per state).
Legacy python models can be converted to a protobuf based serialization. By default this conversion step happens automatically every time a pickled model is used.
Protobuf models have several advantages over pickled ones. They are noticeably smaller (80Mb vs 1.8Mb for the default model), don’t allow arbitrary code execution, and are upward compatible with python 3. Because they are so much more lightweight they are also loaded much faster.
HDF5 is a file format designed to store
large amounts of numerical data efficiently. It was used in the past to handle
pyrnn models without incurring the deserialization and code execution
penalty of pickled objects. As no training facility was included in kraken at
the time, hence no HDF5-only models should exist, support for them has been
clstm, a small and fast implementation of LSTM networks, creates neural networks serialized as protocol buffers. These are usually slightly smaller than converted models and require clstm’s python extension to load and run. While they are significantly faster than the native python models clstm is still in early development and there aren’t many trained models available, yet.
Per default pyrnn models are automatically converted to the new protobuf format
when explicitely defined using the
-m option to the
ocr utility on the
command line. They are stored in the user kraken directory (default is
~/.kraken) and will be automatically substituted in future runs.
This substitution process is extremely fast, in fact loading the pickle is usually several magnitudes slower than converting it once loaded.
If conversion is not desired, e.g. because there is a bug in the conversion
routine, it can be disabled using the