feat(recognition): add configurable confidence aggregation methods #2032

sneakybatman · 2025-12-15T13:23:16Z

Summary

This PR adds support for configurable word-level confidence score aggregation methods in text recognition models. Previously, models used either arithmetic mean or minimum for aggregating character-level confidence scores into word-level confidence, with no way for users to customize this behavior.

Motivation

Different use cases may require different confidence aggregation strategies:

Arithmetic mean: Good general-purpose default, balances all character confidences
Geometric mean: More sensitive to low confidence characters, useful when any low confidence should significantly impact the word score
Harmonic mean: Even more conservative, heavily penalizes low confidence characters
Minimum: Most conservative approach, word confidence equals weakest character (good for high-precision requirements)
Maximum: Most optimistic, useful when you want the best-case confidence
Custom callable: Full flexibility for specialized use cases

Changes

Add aggregate_confidence() utility function in core.py with support for 5 built-in methods plus custom callables
Add ConfidenceAggregation type alias for type hints
Add confidence_aggregation parameter to RecognitionPostProcessor base class
Update all PyTorch PostProcessors: PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR
Update all TensorFlow PostProcessors: PARSeq, ViTSTR, SAR, MASTER
Update remap_preds() for split crop handling to use configurable aggregation
Add comprehensive unit tests (20 new test cases)

Usage Example

from doctr.models import recognition

# Use default aggregation (model-specific)
model = recognition.parseq(pretrained=True)

# Or customize at the PostProcessor level
from doctr.models.recognition.parseq.pytorch import PARSeqPostProcessor

# Use geometric mean for more conservative confidence scores
processor = PARSeqPostProcessor(vocab, confidence_aggregation="geometric_mean")

# Use custom aggregation function
import numpy as np
processor = PARSeqPostProcessor(vocab, confidence_aggregation=lambda probs: np.percentile(probs, 25))

Test plan

All existing tests pass
New unit tests for aggregate_confidence() function cover all 5 methods
Tests verify correct handling of edge cases (empty arrays, single values, zeros)
Tests verify custom callable support
PyTorch postprocessor tests updated and passing
TensorFlow postprocessor tests updated and passing

Add support for configurable word-level confidence score aggregation methods in text recognition models. Users can now choose how to aggregate character-level confidence scores into word-level confidence. Supported aggregation methods: - "mean": Arithmetic mean (default for transformer models) - "geometric_mean": Geometric mean (sensitive to low values) - "harmonic_mean": Harmonic mean (even more sensitive to low values) - "min": Minimum confidence (most conservative, default for CTC/attention models) - "max": Maximum confidence (most optimistic) - Custom callable: User-defined aggregation function Changes: - Add `aggregate_confidence()` utility function in core.py - Add `confidence_aggregation` parameter to RecognitionPostProcessor - Update all PyTorch PostProcessors (PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR) - Update all TensorFlow PostProcessors (PARSeq, ViTSTR, SAR, MASTER) - Update `remap_preds()` for split crop handling - Add comprehensive unit tests for aggregation methods - Maintain backward compatibility with sensible defaults per model type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(recognition): add configurable confidence aggregation methods #2032

feat(recognition): add configurable confidence aggregation methods #2032

Uh oh!

sneakybatman commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(recognition): add configurable confidence aggregation methods #2032

Are you sure you want to change the base?

feat(recognition): add configurable confidence aggregation methods #2032

Uh oh!

Conversation

sneakybatman commented Dec 15, 2025

Summary

Motivation

Changes

Usage Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant