API Reference¶
Segmentation¶
kraken.blla module¶
Note
blla provides the interface to the fully trainable segmenter. For the legacy segmenter interface refer to the pageseg module. Note that recognition models are not interchangeable between segmenters.
- kraken.blla.segment(im, text_direction='horizontal-lr', mask=None, reading_order_fn=polygonal_reading_order, model=None, device='cpu', raise_on_error=False, autocast=False)¶
Segments a page into text lines using the baseline segmenter.
Segments a page into text lines and returns the polyline formed by each baseline and their estimated environment.
- Parameters:
im (PIL.Image.Image) – Input image. The mode can generally be anything but it is possible to supply a binarized-input-only model which requires accordingly treated images.
text_direction (Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']) – Passed-through value for serialization.serialize.
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to determine the reading order. Has to accept a list of tuples (baselines, polygon) and a text direction (lr or rl).
model (Union[List[kraken.lib.vgsl.TorchVGSLModel], kraken.lib.vgsl.TorchVGSLModel]) – One or more TorchVGSLModel containing a segmentation model. If none is given a default model will be loaded.
device (str) – The target device to run the neural network on.
raise_on_error (bool) – Raises error instead of logging them when they are not-blocking
autocast (bool) – Runs the model with automatic mixed precision
- Returns:
A
kraken.containers.Segmentation
class containing reading order sorted baselines (polylines) and their respective polygonal boundaries askraken.containers.BaselineLine
records. The last and first point of each boundary polygon are connected.- Raises:
KrakenInvalidModelException – if the given model is not a valid segmentation model.
KrakenInputException – if the mask is not bitonal or does not match the image size.
- Return type:
Notes
Multi-model operation is most useful for combining one or more region detection models and one text line model. Detected lines from all models are simply combined without any merging or duplicate detection so the chance of the same line appearing multiple times in the output are high. In addition, neural reading order determination is disabled when more than one model outputs lines.
kraken.pageseg module¶
Note
pageseg is the legacy bounding box-based segmenter. For the trainable baseline segmenter interface refer to the blla module. Note that recognition models are not interchangeable between segmenters.
- kraken.pageseg.segment(im, text_direction='horizontal-lr', scale=None, maxcolseps=2, black_colseps=False, no_hlines=True, pad=0, mask=None, reading_order_fn=reading_order)¶
Segments a page into text lines.
Segments a page into text lines and returns the absolute coordinates of each line in reading order.
- Parameters:
im (PIL.Image.Image) – A bi-level page of mode ‘1’ or ‘L’
text_direction (str) – Principal direction of the text (horizontal-lr/rl/vertical-lr/rl)
scale (Optional[float]) – Scale of the image. Will be auto-determined if set to None.
maxcolseps (float) – Maximum number of whitespace column separators
black_colseps (bool) – Whether column separators are assumed to be vertical black lines or not
no_hlines (bool) – Switch for small horizontal line removal.
pad (Union[int, Tuple[int, int]]) – Padding to add to line bounding boxes. If int the same padding is used both left and right. If a 2-tuple, uses (padding_left, padding_right).
mask (Optional[numpy.ndarray]) – A bi-level mask image of the same size as im where 0-valued regions are ignored for segmentation purposes. Disables column detection.
reading_order_fn (Callable) – Function to call to order line output. Callable accepting a list of slices (y, x) and a text direction in (rl, lr).
- Returns:
A
kraken.containers.Segmentation
class containing reading order sorted bounding box-type lines askraken.containers.BBoxLine
records.- Raises:
KrakenInputException – if the input image is not binarized or the text direction is invalid.
- Return type:
Recognition¶
kraken.rpred module¶
- class kraken.rpred.mm_rpred(nets, im, bounds, pad=16, bidi_reordering=True, tags_ignore=None, no_legacy_polygons=False)¶
Multi-model version of kraken.rpred.rpred
- Parameters:
nets (Dict[Tuple[str, str], kraken.lib.models.TorchSeqRecognizer])
im (PIL.Image.Image)
bounds (kraken.containers.Segmentation)
pad (int)
bidi_reordering (Union[bool, str])
tags_ignore (Optional[List[Tuple[str, str]]])
no_legacy_polygons (bool)
- bidi_reordering¶
- bounds¶
- im¶
- len¶
- line_iter¶
- nets¶
- no_legacy_polygons¶
- one_channel_modes¶
- pad¶
- seg_types¶
- tags_ignore¶
- kraken.rpred.rpred(network, im, bounds, pad=16, bidi_reordering=True, no_legacy_polygons=False)¶
Uses a TorchSeqRecognizer and a segmentation to recognize text
- Parameters:
network (kraken.lib.models.TorchSeqRecognizer) – A TorchSegRecognizer object
im (PIL.Image.Image) – Image to extract text from
bounds (kraken.containers.Segmentation) – A Segmentation class instance containing either a baseline or bbox segmentation.
pad (int) – Extra blank padding to the left and right of text line. Auto-disabled when expected network inputs are incompatible with padding.
bidi_reordering (Union[bool, str]) – Reorder classes in the ocr_record according to the Unicode bidirectional algorithm for correct display. Set to L|R to change base text direction.
no_legacy_polygons (bool)
- Yields:
An ocr_record containing the recognized text, absolute character positions, and confidence values for each character.
- Return type:
Generator[kraken.containers.ocr_record, None, None]
Serialization¶
kraken.serialization module¶
- kraken.serialization.render_report(model, chars, errors, char_accuracy, word_accuracy, char_confusions, scripts, insertions, deletions, substitutions)¶
Renders an accuracy report.
- Parameters:
model (str) – Model name.
errors (int) – Number of errors on test set.
char_confusions (dict) – Dictionary mapping a tuple (gt, pred) to a number of occurrences.
scripts (dict) – Dictionary counting character per script.
insertions (dict) – Dictionary counting insertion operations per Unicode script
deletions (int) – Number of deletions
substitutions (dict) – Dictionary counting substitution operations per Unicode script.
chars (int)
char_accuracy (float)
word_accuracy (float)
- Returns:
A string containing the rendered report.
- Return type:
str
- kraken.serialization.serialize(results, image_size=(0, 0), writing_mode='horizontal-tb', scripts=None, template='alto', template_source='native', processing_steps=None)¶
Serializes recognition and segmentation results into an output document.
Serializes a Segmentation container object containing either segmentation or recognition results into an output document. The rendering is performed with jinja2 templates that can either be shipped with kraken (template_source == ‘native’) or custom (template_source == ‘custom’).
Note: Empty records are ignored for serialization purposes.
- Parameters:
segmentation – Segmentation container object
image_size (Tuple[int, int]) – Dimensions of the source image
writing_mode (Literal['horizontal-tb', 'vertical-lr', 'vertical-rl']) – Sets the principal layout of lines and the direction in which blocks progress. Valid values are horizontal-tb, vertical-rl, and vertical-lr.
scripts (Optional[Iterable[str]]) – List of scripts contained in the OCR records
template ([os.PathLike, str]) – Selector for the serialization format. May be ‘hocr’, ‘alto’, ‘page’ or any template found in the template directory. If template_source is set to custom a path to a template is expected.
template_source (Literal['native', 'custom']) – Switch to enable loading of custom templates from outside the kraken package.
processing_steps (Optional[List[kraken.containers.ProcessingStep]]) – A list of ProcessingStep container classes describing the processing kraken performed on the inputs.
results (kraken.containers.Segmentation)
- Returns:
The rendered template
- Return type:
str
Default templates¶
ALTO 4.4¶
{% set proc_type_table = {'processing': 'contentGeneration',
'preprocessing': 'preOperation',
'postprocessing': 'postOperation'}
%}
{%+ macro render_line(page, line) +%}
<TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
{% if line.boundary %}
<Shape>
<Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% if line.recognition|length() == 0 %}
<String CONTENT=""/>
{% else %}
{% for segment in line.recognition %}
{# ALTO forbids encoding whitespace before any String/Shape tags #}
{% if segment.text is whitespace and loop.index > 1 %}
<SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
{% else %}
<String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
{% if segment.boundary %}
<Shape>
<Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% for char in segment.recognition %}
<Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
{% if char.boundary %}
<Shape>
<Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
</Glyph>
{% endfor %}
</String>
{% endif %}
{% endfor %}
{% endif %}
</TextLine>
{%+ endmacro %}
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/standards/alto/ns-v4#"
xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>{{ page.name }}</fileName>
</sourceImageInformation>
{% if metadata.processing_steps %}
{% for step in metadata.processing_steps %}
<Processing ID="OCR_{{ step.id }}">
<processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
<processingStepDescription>{{ step.description }}</processingStepDescription>
<processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endfor %}
{% else %}
<Processing ID="OCR_0">
<processingCategory>other</processingCategory>
<processingStepDescription>unknown</processingStepDescription>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endif %}
</Description>
<Tags>
{% for type, label in page.line_types %}
<OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
{% endfor %}
{% for label in page.region_types %}
<OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
{% endfor %}
</Tags>
{% if page.line_orders|length() > 0 %}
<ReadingOrder>
{% if page.line_orders | length == 1 %}
<OrderedGroup ID="ro_0">
{% for id in page.line_orders[0] %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% else %}
<UnorderedGroup>
{% for ro in page.line_orders %}
<OrderedGroup ID="ro_{{ loop.index }}">
{% for id in ro %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% endfor %}
</UnorderedGroup>
{% endif %}
</ReadingOrder>
{% endif %}
<Layout>
<Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
{% for entity in page.entities %}
{% if entity.type == "region" %}
{% if loop.previtem and loop.previtem.type == 'line' %}
</TextBlock>
{% endif %}
<TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
{% if entity.bbox %}<Shape>
<Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
</Shape>{% endif %}
{%- for line in entity.lines -%}
{{ render_line(page, line) }}
{%- endfor -%}
</TextBlock>
{% else %}
{% if not loop.previtem or loop.previtem.type != 'line' %}
<TextBlock ID="textblock_{{ loop.index }}">
{% endif %}
{{ render_line(page, entity) }}
{% if loop.last %}
</TextBlock>
{% endif %}
{% endif %}
{% endfor %}
</PrintSpace>
</Page>
</Layout>
</alto>
PageXML¶
{% set proc_type_table = {'processing': 'contentGeneration',
'preprocessing': 'preOperation',
'postprocessing': 'postOperation'}
%}
{%+ macro render_line(page, line) +%}
<TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
{% if line.boundary %}
<Shape>
<Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% if line.recognition|length() == 0 %}
<String CONTENT=""/>
{% else %}
{% for segment in line.recognition %}
{# ALTO forbids encoding whitespace before any String/Shape tags #}
{% if segment.text is whitespace and loop.index > 1 %}
<SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
{% else %}
<String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
{% if segment.boundary %}
<Shape>
<Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% for char in segment.recognition %}
<Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
{% if char.boundary %}
<Shape>
<Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
</Glyph>
{% endfor %}
</String>
{% endif %}
{% endfor %}
{% endif %}
</TextLine>
{%+ endmacro %}
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/standards/alto/ns-v4#"
xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>{{ page.name }}</fileName>
</sourceImageInformation>
{% if metadata.processing_steps %}
{% for step in metadata.processing_steps %}
<Processing ID="OCR_{{ step.id }}">
<processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
<processingStepDescription>{{ step.description }}</processingStepDescription>
<processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endfor %}
{% else %}
<Processing ID="OCR_0">
<processingCategory>other</processingCategory>
<processingStepDescription>unknown</processingStepDescription>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endif %}
</Description>
<Tags>
{% for type, label in page.line_types %}
<OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
{% endfor %}
{% for label in page.region_types %}
<OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
{% endfor %}
</Tags>
{% if page.line_orders|length() > 0 %}
<ReadingOrder>
{% if page.line_orders | length == 1 %}
<OrderedGroup ID="ro_0">
{% for id in page.line_orders[0] %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% else %}
<UnorderedGroup>
{% for ro in page.line_orders %}
<OrderedGroup ID="ro_{{ loop.index }}">
{% for id in ro %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% endfor %}
</UnorderedGroup>
{% endif %}
</ReadingOrder>
{% endif %}
<Layout>
<Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
{% for entity in page.entities %}
{% if entity.type == "region" %}
{% if loop.previtem and loop.previtem.type == 'line' %}
</TextBlock>
{% endif %}
<TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
{% if entity.bbox %}<Shape>
<Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
</Shape>{% endif %}
{%- for line in entity.lines -%}
{{ render_line(page, line) }}
{%- endfor -%}
</TextBlock>
{% else %}
{% if not loop.previtem or loop.previtem.type != 'line' %}
<TextBlock ID="textblock_{{ loop.index }}">
{% endif %}
{{ render_line(page, entity) }}
{% if loop.last %}
</TextBlock>
{% endif %}
{% endif %}
{% endfor %}
</PrintSpace>
</Page>
</Layout>
</alto>
hOCR¶
{% set proc_type_table = {'processing': 'contentGeneration',
'preprocessing': 'preOperation',
'postprocessing': 'postOperation'}
%}
{%+ macro render_line(page, line) +%}
<TextLine ID="{{ line.id }}" HPOS="{{ line.bbox[0] }}" VPOS="{{ line.bbox[1] }}" WIDTH="{{ line.bbox[2] - line.bbox[0] }}" HEIGHT="{{ line.bbox[3] - line.bbox[1] }}" {% if line.baseline %}BASELINE="{{ line.baseline|sum(start=[])|join(' ') }}"{% endif %} {% if line.tags %}TAGREFS="{% for type in page.line_types %}{% if type[0] in line.tags and line.tags[type[0]] == type[1] %}LINE_TYPE_{{ loop.index }}{% endif %}{% endfor %}"{% endif %}>
{% if line.boundary %}
<Shape>
<Polygon POINTS="{{ line.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% if line.recognition|length() == 0 %}
<String CONTENT=""/>
{% else %}
{% for segment in line.recognition %}
{# ALTO forbids encoding whitespace before any String/Shape tags #}
{% if segment.text is whitespace and loop.index > 1 %}
<SP ID="segment_{{ segment.index }}" HPOS="{{ segment.bbox[0]}}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}"/>
{% else %}
<String ID="segment_{{ segment.index }}" CONTENT="{{ segment.text|e }}" HPOS="{{ segment.bbox[0] }}" VPOS="{{ segment.bbox[1] }}" WIDTH="{{ segment.bbox[2] - segment.bbox[0] }}" HEIGHT="{{ segment.bbox[3] - segment.bbox[1] }}" WC="{{ (segment.confidences|sum / segment.confidences|length)|round(4) }}">
{% if segment.boundary %}
<Shape>
<Polygon POINTS="{{ segment.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
{% for char in segment.recognition %}
<Glyph ID="char_{{ char.index }}" CONTENT="{{ char.text|e }}" HPOS="{{ char.bbox[0] }}" VPOS="{{ char.bbox[1] }}" WIDTH="{{ char.bbox[2] - char.bbox[0] }}" HEIGHT="{{ char.bbox[3] - char.bbox[1] }}" GC="{{ char.confidence|round(4) }}">
{% if char.boundary %}
<Shape>
<Polygon POINTS="{{ char.boundary|sum(start=[])|join(' ') }}"/>
</Shape>
{% endif %}
</Glyph>
{% endfor %}
</String>
{% endif %}
{% endfor %}
{% endif %}
</TextLine>
{%+ endmacro %}
<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.loc.gov/standards/alto/ns-v4#"
xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v4# http://www.loc.gov/standards/alto/v4/alto-4-3.xsd">
<Description>
<MeasurementUnit>pixel</MeasurementUnit>
<sourceImageInformation>
<fileName>{{ page.name }}</fileName>
</sourceImageInformation>
{% if metadata.processing_steps %}
{% for step in metadata.processing_steps %}
<Processing ID="OCR_{{ step.id }}">
<processingCategory>{{ proc_type_table[step.category] }}</processingCategory>
<processingStepDescription>{{ step.description }}</processingStepDescription>
<processingStepSettings>{% for k, v in step.settings.items() %}{{k}}: {{v}}{% if not loop.last %}; {% endif %}{% endfor %}</processingStepSettings>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endfor %}
{% else %}
<Processing ID="OCR_0">
<processingCategory>other</processingCategory>
<processingStepDescription>unknown</processingStepDescription>
<processingSoftware>
<softwareName>kraken</softwareName>
<softwareVersion>{{ metadata.version }}</softwareVersion>
</processingSoftware>
</Processing>
{% endif %}
</Description>
<Tags>
{% for type, label in page.line_types %}
<OtherTag DESCRIPTION="line type" ID="LINE_TYPE_{{ loop.index }}" TYPE="{{ type }}" LABEL="{{ label }}"/>
{% endfor %}
{% for label in page.region_types %}
<OtherTag DESCRIPTION="region type" ID="REGION_TYPE_{{ loop.index }}" TYPE="region" LABEL="{{ label }}"/>
{% endfor %}
</Tags>
{% if page.line_orders|length() > 0 %}
<ReadingOrder>
{% if page.line_orders | length == 1 %}
<OrderedGroup ID="ro_0">
{% for id in page.line_orders[0] %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% else %}
<UnorderedGroup>
{% for ro in page.line_orders %}
<OrderedGroup ID="ro_{{ loop.index }}">
{% for id in ro %}
<ElementRef ID="o_{{ loop.index }}" REF="{{ id }}"/>
{% endfor %}
</OrderedGroup>
{% endfor %}
</UnorderedGroup>
{% endif %}
</ReadingOrder>
{% endif %}
<Layout>
<Page WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}" PHYSICAL_IMG_NR="0" ID="page_0">
<PrintSpace HPOS="0" VPOS="0" WIDTH="{{ page.size[0] }}" HEIGHT="{{ page.size[1] }}">
{% for entity in page.entities %}
{% if entity.type == "region" %}
{% if loop.previtem and loop.previtem.type == 'line' %}
</TextBlock>
{% endif %}
<TextBlock ID="{{ entity.id }}" {% if entity.bbox %}HPOS="{{ entity.bbox[0] }}" VPOS="{{ entity.bbox[1] }}" WIDTH="{{ entity.bbox[2] - entity.bbox[0] }}" HEIGHT="{{ entity.bbox[3] - entity.bbox[1] }}"{% endif %} {% if entity.tags %}{% for type in page.region_types %}{% if type in entity.tags.values() %}TAGREFS="REGION_TYPE_{{ loop.index }}"{% endif %}{% endfor %}{% endif %}>
{% if entity.bbox %}<Shape>
<Polygon POINTS="{{ entity.boundary|sum(start=[])|join(' ') }}"/>
</Shape>{% endif %}
{%- for line in entity.lines -%}
{{ render_line(page, line) }}
{%- endfor -%}
</TextBlock>
{% else %}
{% if not loop.previtem or loop.previtem.type != 'line' %}
<TextBlock ID="textblock_{{ loop.index }}">
{% endif %}
{{ render_line(page, entity) }}
{% if loop.last %}
</TextBlock>
{% endif %}
{% endif %}
{% endfor %}
</PrintSpace>
</Page>
</Layout>
</alto>
ABBYY XML¶
{%+ macro render_line(page, line) +%}
<line baseline="{{ ((line.bbox[1] + line.bbox[3]) / 2)|int }}" l="{{ line.bbox[0] }}" r="{{ line.bbox[2] }}" t="{{ line.bbox[1] }}" b="{{ line.bbox[3] }}"><formatting lang="">
{% for segment in line.recognition %}
{% for char in segment.recognition %}
{% if loop.first %}
<charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="1" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
{% else %}
<charParams l="{{ char.bbox[0] }}" r="{{ char.bbox[2] }}" t="{{ char.bbox[1] }}" b="{{ char.bbox[3] }}" wordStart="0" charConfidence="{{ [char.confidence]|rescale(0, 100)|int }}">{{ char.text }}</charParams>
{% endif %}
{% endfor %}
{% endfor %}
</formatting>
</line>
{%+ endmacro %}
<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="kraken {{ metadata.version}}">
<page width="{{ page.size[0] }}" height="{{ page.size[1] }}" resolution="0" originalCoords="1">
{% for entity in page.entities %}
{% if entity.type == "region" %}
<block blockType="Text">
<text>
<par>
{%- for line in entity.lines -%}
{{ render_line(page, line) }}
{%- endfor -%}
</par>
</text>
</block>
{% else %}
<block blockType="Text">
<text>
<par>
{{ render_line(page, entity) }}
</par>
</text>
</block>
{% endif %}
{% endfor %}
</page>
</document>
Containers and Helpers¶
kraken.lib.codec module¶
- class kraken.lib.codec.PytorchCodec(charset, strict=False)¶
Builds a codec converting between graphemes/code points and integer label sequences.
charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically. When a mapping is manually provided the label codes need to be a prefix-free code.
As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.
- Parameters:
charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.
strict – Flag indicating if encoding/decoding errors should be ignored or cause an exception.
- Raises:
KrakenCodecException – If the character set contains duplicate entries or the mapping is non-singular or non-prefix-free.
- add_labels(charset)¶
Adds additional characters/labels to the codec.
charset may either be a string, a list or a dict. In the first case each code point will be assigned a label, in the second case each string in the list will be assigned a label, and in the final case each key string will be mapped to the value sequence of integers. In the first two cases labels will be assigned automatically.
As 0 is the blank label in a CTC output layer, output labels and input dictionaries are/should be 1-indexed.
- Parameters:
charset (Union[Dict[str, Sequence[int]], Sequence[str], str]) – Input character set.
- Return type:
- c_sorted¶
- decode(labels)¶
Decodes a labelling.
Given a labelling with cuts and confidences returns a string with the cuts and confidences aggregated across label-code point correspondences. When decoding multilabels to code points the resulting cuts are min/max, confidences are averaged.
- Parameters:
labels (Sequence[Tuple[int, int, int, float]]) – Input containing tuples (label, start, end, confidence).
- Returns:
A list of tuples (code point, start, end, confidence)
- Return type:
List[Tuple[str, int, int, float]]
- encode(s)¶
Encodes a string into a sequence of labels.
If the code is non-singular we greedily encode the longest sequence first.
- Parameters:
s (str) – Input unicode string
- Returns:
Ecoded label sequence
- Raises:
KrakenEncodeException – if the a subsequence is not encodable and the codec is set to strict mode.
- Return type:
torch.IntTensor
- property is_valid: bool¶
Returns True if the codec is prefix-free (in label space) and non-singular (in both directions).
- Return type:
bool
- l2c: Dict[Tuple[int], str]¶
- l2c_single¶
- property max_label: int¶
Returns the maximum label value.
- Return type:
int
- merge(codec)¶
Transforms this codec (c1) into another (c2) reusing as many labels as possible.
The resulting codec is able to encode the same code point sequences while not necessarily having the same labels for them as c2. Retains matching character -> label mappings from both codecs, removes mappings not c2, and adds mappings not in c1. Compound labels in c2 for code point sequences not in c1 containing labels also in use in c1 are added as separate labels.
- Parameters:
codec (PytorchCodec) – PytorchCodec to merge with
- Returns:
A merged codec and a list of labels that were removed from the original codec.
- Return type:
Tuple[PytorchCodec, Set]
- strict¶
kraken.containers module¶
- class kraken.containers.Segmentation¶
A container class for segmentation or recognition results.
In order to allow easy JSON de-/serialization, nested classes for lines (BaselineLine/BBoxLine) and regions (Region) are reinstantiated from their dictionaries.
- type¶
Field indicating if baselines (
kraken.containers.BaselineLine
) or bbox (kraken.containers.BBoxLine
) line records are in the segmentation.
- imagename¶
Path to the image associated with the segmentation.
- text_direction¶
Sets the principal orientation (of the line), i.e. horizontal/vertical, and reading direction (of the document), i.e. lr/rl.
- script_detection¶
Flag indicating if the line records have tags.
- lines¶
List of line records. Records are expected to be in a valid reading order.
- regions¶
Dict mapping types to lists of regions.
- line_orders¶
List of alternative reading orders for the segmentation. Each reading order is a list of line indices.
- imagename: str | os.PathLike¶
- line_orders: List[List[int]] | None = None¶
- lines: List[BaselineLine | BBoxLine] | None = None¶
- script_detection: bool¶
- text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl']¶
- type: Literal['baselines', 'bbox']¶
- class kraken.containers.BaselineLine¶
Baseline-type line record.
A container class for a single line in baseline + bounding polygon format, optionally containing a transcription, tags, or associated regions.
- id¶
Unique identifier
- baseline¶
List of tuples (x_n, y_n) defining the baseline.
- boundary¶
List of tuples (x_n, y_n) defining the bounding polygon of the line. The first and last points should be identical.
- text¶
Transcription of this line.
- base_dir¶
An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- imagename¶
Path to the image associated with the line.
- tags¶
A dict mapping types to values.
- split¶
Defines whether this line is in the train, validation, or test set during training.
- regions¶
A list of identifiers of regions the line is associated with.
- base_dir: Literal['L', 'R'] | None = None¶
- baseline: List[Tuple[int, int]]¶
- boundary: List[Tuple[int, int]]¶
- id: str¶
- imagename: str | os.PathLike | None = None¶
- regions: List[str] | None = None¶
- split: Literal['train', 'validation', 'test'] | None = None¶
- tags: Dict[str, str] | None = None¶
- text: str | None = None¶
- type: str = 'baselines'¶
- class kraken.containers.BBoxLine¶
Bounding box-type line record.
A container class for a single line in axis-aligned bounding box format, optionally containing a transcription, tags, or associated regions.
- id¶
Unique identifier
- bbox¶
Tuple in form (xmin, ymin, xmax, ymax) defining the bounding box.
- text¶
Transcription of this line.
- base_dir¶
An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- imagename¶
Path to the image associated with the line..
- tags¶
A dict mapping types to values.
- split¶
Defines whether this line is in the train, validation, or test set during training.
- regions¶
A list of identifiers of regions the line is associated with.
- text_direction¶
Sets the principal orientation (of the line) and reading direction (of the document).
- base_dir: Literal['L', 'R'] | None = None¶
- bbox: Tuple[int, int, int, int]¶
- id: str¶
- imagename: str | os.PathLike | None = None¶
- regions: List[str] | None = None¶
- split: Literal['train', 'validation', 'test'] | None = None¶
- tags: Dict[str, str] | None = None¶
- text: str | None = None¶
- text_direction: Literal['horizontal-lr', 'horizontal-rl', 'vertical-lr', 'vertical-rl'] = 'horizontal-lr'¶
- type: str = 'bbox'¶
- class kraken.containers.Region¶
Container class of a single polygonal region.
- id¶
Unique identifier
- boundary¶
List of tuples (x_n, y_n) defining the bounding polygon of the region. The first and last points should be identical.
- imagename¶
Path to the image associated with the region.
- tags¶
A dict mapping types to values.
- boundary: List[Tuple[int, int]]¶
- id: str¶
- imagename: str | os.PathLike | None = None¶
- tags: Dict[str, str] | None = None¶
- class kraken.containers.ocr_record(prediction, cuts, confidences, display_order=True)¶
A record object containing the recognition result of a single line
- Parameters:
prediction (str)
cuts (List[Union[Tuple[int, int], Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]]])
confidences (List[float])
display_order (bool)
- base_dir = None¶
- property confidences: List[float]¶
- Return type:
List[float]
- property cuts: List¶
- Return type:
List
- abstract display_order(base_dir)¶
- Return type:
- abstract logical_order(base_dir)¶
- Return type:
- property prediction: str¶
- Return type:
str
- abstract property type¶
- class kraken.containers.BaselineOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)¶
A record object containing the recognition result of a single line in baseline format.
- Parameters:
prediction (str)
cuts (List[Tuple[int, int]])
confidences (List[float])
line (Union[BaselineLine, Dict[str, Any]])
base_dir (Optional[Literal['L', 'R']])
display_order (bool)
- type¶
‘baselines’ to indicate a baseline record
- prediction¶
The text predicted by the network as one continuous string.
- Return type:
str
- cuts¶
The absolute bounding polygons for each code point in prediction as a list of tuples [(x0, y0), (x1, y2), …].
- Return type:
List[Tuple[int, int]]
- confidences¶
A list of floats indicating the confidence value of each code point.
- Return type:
List[float]
- base_dir¶
An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- display_order¶
Flag indicating the order of the code points in the prediction. In display order (True) the n-th code point in the string corresponds to the n-th leftmost code point, in logical order (False) the n-th code point corresponds to the n-th read code point. See [UAX #9](https://unicode.org/reports/tr9) for more details.
- Parameters:
base_dir (Optional[Literal['L', 'R']])
- Return type:
Notes
When slicing the record the behavior of the cuts is changed from earlier versions of kraken. Instead of returning per-character bounding polygons a single polygons section of the line bounding polygon starting at the first and extending to the last code point emitted by the network is returned. This aids numerical stability when computing aggregated bounding polygons such as for words. Individual code point bounding polygons are still accessible through the cuts attribute or by iterating over the record code point by code point.
- base_dir¶
- property cuts: List[Tuple[int, int]]¶
- Return type:
List[Tuple[int, int]]
- display_order(base_dir=None)¶
Returns the OCR record in Unicode display order, i.e. ordered from left to right inside the line.
- Parameters:
base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- Return type:
- logical_order(base_dir=None)¶
Returns the OCR record in Unicode logical order, i.e. in the order the characters in the line would be read by a human.
- Parameters:
base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- Return type:
- type = 'baselines'¶
- class kraken.containers.BBoxOCRRecord(prediction, cuts, confidences, line, base_dir=None, display_order=True)¶
A record object containing the recognition result of a single line in bbox format.
- Parameters:
prediction (str)
cuts (List[Tuple[Tuple[int, int], Tuple[int, int], Tuple[int, int], Tuple[int, int]]])
confidences (List[float])
line (Union[BBoxLine, Dict[str, Any]])
base_dir (Optional[Literal['L', 'R']])
display_order (bool)
- type¶
‘bbox’ to indicate a bounding box record
- prediction¶
The text predicted by the network as one continuous string.
- Return type:
str
- cuts¶
The absolute bounding polygons for each code point in prediction as a list of 4-tuples ((x0, y0), (x1, y0), (x1, y1), (x0, y1)).
- Return type:
List
- confidences¶
A list of floats indicating the confidence value of each code point.
- Return type:
List[float]
- base_dir¶
An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- display_order¶
Flag indicating the order of the code points in the prediction. In display order (True) the n-th code point in the string corresponds to the n-th leftmost code point, in logical order (False) the n-th code point corresponds to the n-th read code point. See [UAX #9](https://unicode.org/reports/tr9) for more details.
- Parameters:
base_dir (Optional[Literal['L', 'R']])
- Return type:
Notes
When slicing the record the behavior of the cuts is changed from earlier versions of kraken. Instead of returning per-character bounding polygons a single polygons section of the line bounding polygon starting at the first and extending to the last code point emitted by the network is returned. This aids numerical stability when computing aggregated bounding polygons such as for words. Individual code point bounding polygons are still accessible through the cuts attribute or by iterating over the record code point by code point.
- base_dir¶
- display_order(base_dir=None)¶
Returns the OCR record in Unicode display order, i.e. ordered from left to right inside the line.
- Parameters:
base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- Return type:
- logical_order(base_dir=None)¶
Returns the OCR record in Unicode logical order, i.e. in the order the characters in the line would be read by a human.
- Parameters:
base_dir (Optional[Literal['L', 'R']]) – An optional string defining the base direction (also called paragraph direction) for the BiDi algorithm. Valid values are ‘L’ or ‘R’. If None is given the default auto-resolution will be used.
- Return type:
- type = 'bbox'¶
- class kraken.containers.ProcessingStep¶
A processing step in the recognition pipeline.
- id¶
Unique identifier
- category¶
Category of processing step that has been performed.
- description¶
Natural-language description of the process.
- settings¶
Dict describing the parameters of the processing step.
- category: Literal['preprocessing', 'processing', 'postprocessing']¶
- description: str¶
- id: str¶
- settings: Dict[str, Dict | str | float | int | bool]¶
kraken.lib.ctc_decoder¶
- kraken.lib.ctc_decoder.beam_decoder(outputs, beam_size=3)¶
Translates back the network output to a label sequence using same-prefix-merge beam search decoding as described in [0].
[0] Hannun, Awni Y., et al. “First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs.” arXiv preprint arXiv:1408.2873 (2014).
- Parameters:
output – (C, W) shaped softmax output tensor
beam_size (int) – Size of the beam
outputs (numpy.ndarray)
- Returns:
A list with tuples (class, start, end, prob). max is the maximum value of the softmax layer in the region.
- Return type:
List[Tuple[int, int, int, float]]
- kraken.lib.ctc_decoder.greedy_decoder(outputs)¶
Translates back the network output to a label sequence using greedy/best path decoding as described in [0].
[0] Graves, Alex, et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
- Parameters:
output – (C, W) shaped softmax output tensor
outputs (numpy.ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- Return type:
List[Tuple[int, int, int, float]]
- kraken.lib.ctc_decoder.blank_threshold_decoder(outputs, threshold=0.5)¶
Translates back the network output to a label sequence as the original ocropy/clstm.
Thresholds on class 0, then assigns the maximum (non-zero) class to each region.
- Parameters:
output – (C, W) shaped softmax output tensor
threshold (float) – Threshold for 0 class when determining possible label locations.
outputs (numpy.ndarray)
- Returns:
A list with tuples (class, start, end, max). max is the maximum value of the softmax layer in the region.
- Return type:
List[Tuple[int, int, int, float]]
kraken.lib.exceptions¶
- class kraken.lib.exceptions.KrakenCodecException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenStopTrainingException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenEncodeException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenRecordException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenInvalidModelException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenInputException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenRepoException(message=None)¶
Common base class for all non-exit exceptions.
- class kraken.lib.exceptions.KrakenCairoSurfaceException(message, width, height)¶
Raised when the Cairo surface couldn’t be created.
- Parameters:
message (str)
width (int)
height (int)
- message¶
Error message
- Type:
str
- width¶
Width of the surface
- Type:
int
- height¶
Height of the surface
- Type:
int
- height¶
- message¶
- width¶
kraken.lib.models module¶
- class kraken.lib.models.TorchSeqRecognizer(nn, decoder=kraken.lib.ctc_decoder.greedy_decoder, train=False, device='cpu')¶
A wrapper class around a TorchVGSLModel for text recognition.
- Parameters:
train (bool)
device (str)
- codec¶
- decoder¶
- device¶
- forward(line, lens=None)¶
Performs a forward pass on a torch tensor of one or more lines with shape (N, C, H, W) and returns a numpy array (N, W, C).
- Parameters:
line (torch.Tensor) – NCHW line tensor
lens (torch.Tensor) – Optional tensor containing sequence lengths if N > 1
- Returns:
Tuple with (N, W, C) shaped numpy array and final output sequence lengths.
- Raises:
KrakenInputException – Is raised if the channel dimension isn’t of size 1 in the network output.
- Return type:
Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]
- kind = ''¶
- nn¶
- one_channel_mode¶
- predict(line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns the decoding as a list of tuples (string, start, end, confidence).
- Parameters:
line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor containing sequence lengths if N > 1
- Returns:
List of decoded sequences.
- Return type:
List[List[Tuple[str, int, int, float]]]
- predict_labels(line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a list of tuples (class, start, end, max). Max is the maximum value of the softmax layer in the region.
- Parameters:
line (torch.tensor)
lens (torch.Tensor)
- Return type:
List[List[Tuple[int, int, int, float]]]
- predict_string(line, lens=None)¶
Performs a forward pass on a torch tensor of a line with shape (N, C, H, W) and returns a string of the results.
- Parameters:
line (torch.Tensor) – NCHW line tensor
lens (Optional[torch.Tensor]) – Optional tensor containing the sequence lengths of the input batch.
- Return type:
List[str]
- seg_type¶
- to(device)¶
Moves model to device and automatically loads input tensors onto it.
- train¶
- kraken.lib.models.load_any(fname, train=False, device='cpu')¶
Loads anything that was, is, and will be a valid ocropus model and instantiates a shiny new kraken.lib.lstm.SeqRecognizer from the RNN configuration in the file.
Currently it recognizes the following kinds of models:
protobuf models containing VGSL segmentation and recognition networks.
Additionally an attribute ‘kind’ will be added to the SeqRecognizer containing a string representation of the source kind. Current known values are:
vgsl for VGSL models
- Parameters:
fname (Union[os.PathLike, str]) – Path to the model
train (bool) – Enables gradient calculation and dropout layers in model.
device (str) – Target device
- Returns:
A kraken.lib.models.TorchSeqRecognizer object.
- Raises:
KrakenInvalidModelException – if the model is not loadable by any parser.
- Return type:
kraken.lib.segmentation module¶
- kraken.lib.segmentation.reading_order(lines, text_direction='lr')¶
Given the list of lines (a list of 2D slices), computes the partial reading order. The output is a binary 2D array such that order[i,j] is true if line i comes before line j in reading order.
- Parameters:
lines (Sequence[Tuple[slice, slice]])
text_direction (Literal['lr', 'rl'])
- Return type:
numpy.ndarray
- kraken.lib.segmentation.neural_reading_order(lines, text_direction='lr', regions=None, im_size=None, model=None, class_mapping=None)¶
Given a list of baselines and regions, calculates the correct reading order and applies it to the input.
- Parameters:
lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.
model (kraken.lib.vgsl.TorchVGSLModel) – torch Module for
text_direction (str)
regions (Optional[Sequence[shapely.geometry.Polygon]])
im_size (Tuple[int, int])
class_mapping (Dict[str, int])
- Returns:
The indices of the ordered input.
- Return type:
Sequence[int]
- kraken.lib.segmentation.polygonal_reading_order(lines, text_direction='lr', regions=None)¶
Given a list of baselines and regions, calculates the correct reading order and applies it to the input.
- Parameters:
lines (Sequence[Dict]) – List of tuples containing the baseline and its polygonization.
regions (Optional[Sequence[shapely.geometry.Polygon]]) – List of region polygons.
text_direction (Literal['lr', 'rl']) – Set principal text direction for column ordering. Can be ‘lr’ or ‘rl’
- Returns:
The indices of the ordered input.
- Return type:
Sequence[int]
- kraken.lib.segmentation.vectorize_lines(im, threshold=0.17, min_length=5, text_direction='horizontal')¶
Vectorizes lines from a binarized array.
- Parameters:
im (np.ndarray) – Array of shape (3, H, W) with the first dimension being probabilities for (start_separators, end_separators, baseline).
threshold (float) – Threshold for baseline blob detection.
min_length (int) – Minimal length of output baselines.
text_direction (str) – Base orientation of the text line (horizontal or vertical).
- Returns:
[[x0, y0, … xn, yn], [xm, ym, …, xk, yk], … ] A list of lists containing the points of all baseline polylines.
- kraken.lib.segmentation.calculate_polygonal_environment(im=None, baselines=None, suppl_obj=None, im_feats=None, scale=None, topline=False, raise_on_error=False)¶
Given a list of baselines and an input image, calculates a polygonal environment around each baseline.
- Parameters:
im (PIL.Image.Image) – grayscale input image (mode ‘L’)
baselines (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing a single baseline per entry.
suppl_obj (Sequence[Sequence[Tuple[int, int]]]) – List of lists containing additional polylines that should be considered hard boundaries for polygonizaton purposes. Can be used to prevent polygonization into non-text areas such as illustrations or to compute the polygonization of a subset of the lines in an image.
im_feats (numpy.ndarray) – An optional precomputed seamcarve energy map. Overrides data in im. The default map is gaussian_filter(sobel(im), 2).
scale (Tuple[int, int]) – A 2-tuple (h, w) containing optional scale factors of the input. Values of 0 are used for aspect-preserving scaling. None skips input scaling.
topline (bool) – Switch to change default baseline location for offset calculation purposes. If set to False, baselines are assumed to be on the bottom of the text line and will be offset upwards, if set to True, baselines are on the top and will be offset downwards. If set to None, no offset will be applied.
raise_on_error (bool) – Raises error instead of logging them when they are not-blocking
- Returns:
List of lists of coordinates. If no polygonization could be compute for a baseline None is returned instead.
- kraken.lib.segmentation.scale_polygonal_lines(lines, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Parameters:
lines (Sequence[Tuple[List, List]]) – List of tuples containing the baseline and its polygonization.
scale (Union[float, Tuple[float, float]]) – Scaling factor
- Return type:
Sequence[Tuple[List, List]]
- kraken.lib.segmentation.scale_regions(regions, scale)¶
Scales baselines/polygon coordinates by a certain factor.
- Parameters:
lines – List of tuples containing the baseline and its polygonization.
scale (Union[float, Tuple[float, float]]) – Scaling factor
regions (Sequence[Tuple[List[int], List[int]]])
- Return type:
Sequence[Tuple[List, List]]
- kraken.lib.segmentation.compute_polygon_section(baseline, boundary, dist1, dist2)¶
Given a baseline, polygonal boundary, and two points on the baseline return the rectangle formed by the orthogonal cuts on that baseline segment. The resulting polygon is not garantueed to have a non-zero area.
The distance can be larger than the actual length of the baseline if the baseline endpoints are inside the bounding polygon. In that case the baseline will be extrapolated to the polygon edge.
- Parameters:
baseline (Sequence[Tuple[int, int]]) – A polyline ((x1, y1), …, (xn, yn))
boundary (Sequence[Tuple[int, int]]) – A bounding polygon around the baseline (same format as baseline). Last and first point are automatically connected.
dist1 (int) – Absolute distance along the baseline of the first point.
dist2 (int) – Absolute distance along the baseline of the second point.
- Returns:
A sequence of polygon points.
- Return type:
Tuple[Tuple[int, int]]
- kraken.lib.segmentation.extract_polygons(im, bounds, legacy=False)¶
Yields the subimages of image im defined in the list of bounding polygons with baselines preserving order.
- Parameters:
im (PIL.Image.Image) – Input image
bounds (kraken.containers.Segmentation) – A Segmentation class containing a bounding box or baseline segmentation.
legacy (bool) – Use the old, slow, and deprecated path
- Yields:
The extracted subimage, and the corresponding bounding box or baseline
- Return type:
Generator[Tuple[PIL.Image.Image, Union[kraken.containers.BBoxLine, kraken.containers.BaselineLine]], None, None]
kraken.lib.vgsl module¶
- class kraken.lib.vgsl.TorchVGSLModel(spec)¶
Class building a torch module from a VSGL spec.
The initialized class will contain a variable number of layers and a loss function. Inputs and outputs are always 4D tensors in order (batch, channels, height, width) with channels always being the feature dimension.
Importantly this means that a recurrent network will be fed the channel vector at each step along its time axis, i.e. either put the non-time-axis dimension into the channels dimension or use a summarizing RNN squashing the time axis to 1 and putting the output into the channels dimension respectively.
- Parameters:
spec (str)
- input¶
Expected input tensor as a 4-tuple.
- nn¶
Stack of layers parsed from the spec.
- criterion¶
Fully parametrized loss function.
- user_metadata¶
dict with user defined metadata. Is flushed into model file during saving/overwritten by loading operations.
- one_channel_mode¶
Field indicating the image type used during training of one-channel images. Is ‘1’ for models trained on binarized images, ‘L’ for grayscale, and None otherwise.
- add_codec(codec)¶
Adds a PytorchCodec to the model.
- Parameters:
codec (kraken.lib.codec.PytorchCodec)
- Return type:
None
- append(idx, spec)¶
Splits a model at layer idx and append layers spec.
New layers are initialized using the init_weights method.
- Parameters:
idx (int) – Index of layer to append spec to starting with 1. To select the whole layer stack set idx to None.
spec (str) – VGSL spec without input block to append to model.
- Return type:
None
- property aux_layers¶
- blocks¶
- build_addition(input, blocks, idx, target_output_shape=None)¶
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_conv(input, blocks, idx, target_output_shape=None)¶
Builds a 2D convolution layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_dropout(input, blocks, idx, target_output_shape=None)¶
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_groupnorm(input, blocks, idx, target_output_shape=None)¶
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_identity(input, blocks, idx, target_output_shape=None)¶
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_maxpool(input, blocks, idx, target_output_shape=None)¶
Builds a maxpool layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_output(input, blocks, idx, target_output_shape=None)¶
Builds an output layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_parallel(input, blocks, idx, target_output_shape=None)¶
Builds a block of parallel layers.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_reshape(input, blocks, idx, target_output_shape=None)¶
Builds a reshape layer
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_rnn(input, blocks, idx, target_output_shape=None)¶
Builds an LSTM/GRU layer returning number of outputs and layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_ro(input, blocks, idx)¶
Builds a RO determination layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_series(input, blocks, idx, target_output_shape=None)¶
Builds a serial block of layers.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- build_wav2vec2(input, blocks, idx, target_output_shape=None)¶
Builds a Wav2Vec2 masking layer.
- Parameters:
input (Tuple[int, int, int, int])
blocks (List[str])
idx (int)
target_output_shape (Optional[Tuple[int, int, int, int]])
- Return type:
Union[Tuple[None, None, None], Tuple[Tuple[int, int, int, int], str, Callable]]
- codec: kraken.lib.codec.PytorchCodec | None = None¶
- criterion: Any = None¶
- eval()¶
Sets the model to evaluation/inference mode, disabling dropout and gradient calculation.
- Return type:
None
- property hyper_params¶
- idx¶
- init_weights(idx=slice(0, None))¶
Initializes weights for all or a subset of layers in the graph.
LSTM/GRU layers are orthogonally initialized, convolutional layers uniformly from (-0.1,0.1).
- Parameters:
idx (slice) – A slice object representing the indices of layers to initialize.
- Return type:
None
- input¶
- classmethod load_model(path)¶
Deserializes a VGSL model from a CoreML file.
- Parameters:
path (Union[str, os.PathLike]) – CoreML file
- Returns:
A TorchVGSLModel instance.
- Raises:
KrakenInvalidModelException if the model data is invalid (not a –
string, protobuf file, or without appropriate metadata). –
FileNotFoundError if the path doesn't point to a file. –
- m¶
- property model_type¶
- named_spec: List[str] = []¶
- nn¶
- property one_channel_mode¶
- ops¶
- pattern¶
- resize_output(output_size, del_indices=None)¶
Resizes an output layer.
- Parameters:
output_size (int) – New size/output channels of last layer
del_indices (list) – list of outputs to delete from layer
- Return type:
None
- save_model(path)¶
Serializes the model into path.
- Parameters:
path (str) – Target destination
- property seg_type¶
- set_num_threads(num)¶
Sets number of OpenMP threads to use.
- Parameters:
num (int)
- Return type:
None
- spec¶
- to(device)¶
- Parameters:
device (Union[str, torch.device])
- Return type:
None
- train()¶
Sets the model to training mode (enables dropout layers and disables softmax on CTC layers).
- Return type:
None
- property use_legacy_polygons¶
- user_metadata: Dict[str, Any]¶
kraken.lib.xml module¶
- class kraken.lib.xml.XMLPage(filename, filetype='xml')¶
- Parameters:
filename (Union[str, os.PathLike])
filetype (Literal['xml', 'alto', 'page'])
Training¶
kraken.lib.train module¶
Loss and Evaluation Functions¶
Trainer¶
- class kraken.lib.train.KrakenTrainer(enable_progress_bar=True, enable_summary=True, min_epochs=5, max_epochs=100, freeze_backbone=-1, pl_logger=None, log_dir=None, *args, **kwargs)¶
- Parameters:
enable_progress_bar (bool)
enable_summary (bool)
min_epochs (int)
max_epochs (int)
pl_logger (Union[lightning.pytorch.loggers.logger.Logger, str, None])
log_dir (Optional[os.PathLike])
- automatic_optimization = False¶
- fit(*args, **kwargs)¶
kraken.lib.dataset module¶
Recognition datasets¶
- class kraken.lib.dataset.ArrowIPCRecognitionDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, split_filter=None)¶
Dataset for training a recognition model from a precompiled dataset in Arrow IPC format.
- Parameters:
normalization (Optional[str])
whitespace_normalization (bool)
skip_empty_lines (bool)
reorder (Union[bool, Literal['L', 'R']])
im_transforms (Callable[[Any], torch.Tensor])
augmentation (bool)
split_filter (Optional[str])
- add(file)¶
Adds an Arrow IPC file to the dataset.
- Parameters:
file (Union[str, os.PathLike]) – Location of the precompiled dataset file.
- Return type:
None
- alphabet: collections.Counter¶
- arrow_table = None¶
- aug = None¶
- codec = None¶
- encode(codec=None)¶
Adds a codec to the dataset.
- Parameters:
codec (Optional[kraken.lib.codec.PytorchCodec])
- Return type:
None
- failed_samples¶
- im_mode¶
- legacy_polygons_status = None¶
- no_encode()¶
Creates an unencoded dataset.
- Return type:
None
- rebuild_alphabet()¶
Recomputes the alphabet depending on the given text transformation.
- seg_type = None¶
- skip_empty_lines¶
- text_transforms: List[Callable[[str], str]] = []¶
- transforms¶
- class kraken.lib.dataset.BaselineSet(line_width=4, padding=(0, 0, 0, 0), im_transforms=transforms.Compose([]), augmentation=False, valid_baselines=None, merge_baselines=None, valid_regions=None, merge_regions=None)¶
Dataset for training a baseline/region segmentation model.
- Parameters:
line_width (int)
padding (Tuple[int, int, int, int])
im_transforms (Callable[[Any], torch.Tensor])
augmentation (bool)
valid_baselines (Sequence[str])
merge_baselines (Dict[str, Sequence[str]])
valid_regions (Sequence[str])
merge_regions (Dict[str, Sequence[str]])
- add(doc)¶
Adds a page to the dataset.
- Parameters:
doc (kraken.containers.Segmentation) – A Segmentation container class.
- aug = None¶
- class_mapping¶
- class_stats¶
- failed_samples¶
- im_mode = '1'¶
- imgs = []¶
- line_width¶
- mbl_dict¶
- mreg_dict¶
- num_classes = 2¶
- pad¶
- seg_type = None¶
- targets = []¶
- transform(image, target)¶
- transforms¶
- valid_baselines¶
- valid_regions¶
- class kraken.lib.dataset.GroundTruthDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False)¶
Dataset for training a line recognition model.
All data is cached in memory.
- Parameters:
normalization (Optional[str])
whitespace_normalization (bool)
skip_empty_lines (bool)
reorder (Union[bool, str])
im_transforms (Callable[[Any], torch.Tensor])
augmentation (bool)
- add(line=None, page=None)¶
Adds an individual line or all lines on a page to the dataset.
- Parameters:
line (Optional[kraken.containers.BBoxLine]) – BBoxLine container object of a line.
page (Optional[kraken.containers.Segmentation]) – Segmentation container object for a page.
- add_line(line)¶
Adds a line to the dataset.
- Parameters:
line (kraken.containers.BBoxLine) – BBoxLine container object for a line.
- Raises:
ValueError if the transcription of the line is empty after –
transformation or either baseline or bounding polygon are missing. –
- add_page(page)¶
Adds all lines on a page to the dataset.
Invalid lines will be skipped and a warning will be printed.
- Parameters:
page (kraken.containers.Segmentation) – Segmentation container object for a page.
- alphabet: collections.Counter¶
- aug = None¶
- encode(codec=None)¶
Adds a codec to the dataset and encodes all text lines.
Has to be run before sampling from the dataset.
- Parameters:
codec (Optional[kraken.lib.codec.PytorchCodec])
- Return type:
None
- failed_samples¶
- property im_mode¶
- no_encode()¶
Creates an unencoded dataset.
- Return type:
None
- seg_type = 'bbox'¶
- skip_empty_lines¶
- text_transforms: List[Callable[[str], str]] = []¶
- transforms¶
Segmentation datasets¶
- class kraken.lib.dataset.PolygonGTDataset(normalization=None, whitespace_normalization=True, skip_empty_lines=True, reorder=True, im_transforms=transforms.Compose([]), augmentation=False, legacy_polygons=False)¶
Dataset for training a line recognition model from polygonal/baseline data.
- Parameters:
normalization (Optional[str])
whitespace_normalization (bool)
skip_empty_lines (bool)
reorder (Union[bool, Literal['L', 'R']])
im_transforms (Callable[[Any], torch.Tensor])
augmentation (bool)
legacy_polygons (bool)
- add(line=None, page=None)¶
Adds an individual line or all lines on a page to the dataset.
- Parameters:
line (Optional[kraken.containers.BaselineLine]) – BaselineLine container object of a line.
page (Optional[kraken.containers.Segmentation]) – Segmentation container object for a page.
- add_line(line)¶
Adds a line to the dataset.
- Parameters:
line (kraken.containers.BaselineLine) – BaselineLine container object for a line.
- Raises:
ValueError if the transcription of the line is empty after –
transformation or either baseline or bounding polygon are missing. –
- add_page(page)¶
Adds all lines on a page to the dataset.
Invalid lines will be skipped and a warning will be printed.
- Parameters:
page (kraken.containers.Segmentation) – Segmentation container object for a page.
- alphabet: collections.Counter¶
- aug = None¶
- encode(codec=None)¶
Adds a codec to the dataset and encodes all text lines.
Has to be run before sampling from the dataset.
- Parameters:
codec (Optional[kraken.lib.codec.PytorchCodec])
- Return type:
None
- failed_samples¶
- property im_mode¶
- legacy_polygons¶
- no_encode()¶
Creates an unencoded dataset.
- Return type:
None
- seg_type = 'baselines'¶
- skip_empty_lines¶
- text_transforms: List[Callable[[str], str]] = []¶
- transforms¶
Reading order datasets¶
- class kraken.lib.dataset.PairWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)¶
Dataset for training a reading order determination model.
Returns random pairs of lines from the same page.
- Parameters:
files (Sequence[Union[os.PathLike, str]])
mode (Optional[Literal['alto', 'page', 'xml']])
level (Literal['regions', 'baselines'])
ro_id (Optional[str])
class_mapping (Optional[Dict[str, int]])
- data = []¶
- failed_samples = []¶
- get_feature_dim()¶
- class kraken.lib.dataset.PageWiseROSet(files=None, mode='xml', level='baselines', ro_id=None, class_mapping=None)¶
Dataset for training a reading order determination model.
Returns all lines from the same page.
- Parameters:
files (Sequence[Union[os.PathLike, str]])
mode (Optional[Literal['alto', 'page', 'xml']])
level (Literal['regions', 'baselines'])
ro_id (Optional[str])
class_mapping (Optional[Dict[str, int]])
- data = []¶
- failed_samples = []¶
- get_feature_dim()¶
Helpers¶
- class kraken.lib.dataset.ImageInputTransforms(batch, height, width, channels, pad, valid_norm=True, force_binarization=False)¶
- Parameters:
batch (int)
height (int)
width (int)
channels (int)
pad (Union[int, Tuple[int, int], Tuple[int, int, int, int]])
valid_norm (bool)
force_binarization (bool)
- property batch: int¶
Batch size attribute. Ignored.
- Return type:
int
- property centerline_norm: bool¶
Attribute indicating if centerline normalization will be applied to input images.
- Return type:
bool
- property channels: int¶
Channels attribute. Can be either 1 (binary/grayscale), 3 (RGB).
- Return type:
int
- property force_binarization: bool¶
Switch enabling/disabling forced binarization.
- Return type:
bool
- property height: int¶
Desired output image height. If set to 0, image will be rescaled proportionally with width, if 1 and channels is larger than 3 output will be grayscale and of the height set with the channels attribute.
- Return type:
int
- property mode: str¶
Imaginary PIL.Image.Image mode of the output tensor. Possible values are RGB, L, and 1.
- Return type:
str
- property pad: int¶
Amount of padding around left/right end of image.
- Return type:
int
- property scale: Tuple[int, int]¶
Desired output shape (height, width) of the image. If any value is set to 0, image will be rescaled proportionally with height, width, if 1 and channels is larger than 3 output will be grayscale and of the height set with the channels attribute.
- Return type:
Tuple[int, int]
- property valid_norm: bool¶
Switch allowing/disallowing centerline normalization. Even if enabled won’t be applied to 3-channel images.
- Return type:
bool
- property width: int¶
Desired output image width. If set to 0, image will be rescaled proportionally with height.
- Return type:
int
- kraken.lib.dataset.collate_sequences(batch)¶
Sorts and pads sequences.
- kraken.lib.dataset.global_align(seq1, seq2)¶
Computes a global alignment of two strings.
- Parameters:
seq1 (Sequence[Any])
seq2 (Sequence[Any])
- Return type:
Tuple[int, List[str], List[str]]
Returns a tuple (distance, list(algn1), list(algn2))
- kraken.lib.dataset.compute_confusions(algn1, algn2)¶
Compute confusion matrices from two globally aligned strings.
- Parameters:
align1 (Sequence[str]) – sequence 1
align2 (Sequence[str]) – sequence 2
algn1 (Sequence[str])
algn2 (Sequence[str])
- Returns:
A tuple (counts, scripts, ins, dels, subs) with counts being per-character confusions, scripts per-script counts, ins a dict with per script insertions, del an integer of the number of deletions, subs per script substitutions.
Legacy modules¶
These modules are retained for compatibility reasons or highly specialized use cases. In most cases their use is not necessary and they aren’t further developed for interoperability with new functionality, e.g. the transcription and line generation modules do not work with the baseline segmenter.
kraken.binarization module¶
- kraken.binarization.nlbin(im, threshold=0.5, zoom=0.5, escale=1.0, border=0.1, perc=80, range=20, low=5, high=90)¶
Performs binarization using non-linear processing.
- Parameters:
im (PIL.Image.Image) – Input image
threshold (float)
zoom (float) – Zoom for background page estimation
escale (float) – Scale for estimating a mask over the text region
border (float) – Ignore this much of the border
perc (int) – Percentage for filters
range (int) – Range for filters
low (int) – Percentile for black estimation
high (int) – Percentile for white estimation
- Returns:
PIL.Image.Image containing the binarized image
- Raises:
KrakenInputException – When trying to binarize an empty image.
- Return type:
PIL.Image.Image
kraken.transcribe module¶
- class kraken.transcribe.TranscriptionInterface(font=None, font_style=None)¶
- add_page(im, segmentation=None)¶
Adds an image to the transcription interface, optionally filling in information from a list of ocr_record objects.
- Parameters:
im – Input image
segmentation – Output of the segment method.
- env¶
- font¶
- line_idx = 1¶
- page_idx = 1¶
- pages: List[Dict[Any, Any]] = []¶
- seg_idx = 1¶
- text_direction = 'horizontal-tb'¶
- tmpl¶
- write(fd)¶
Writes the HTML file to a file descriptor.
- Parameters:
fd (File) – File descriptor (mode=’rb’) to write to.