Confidence Score

A confidence score in document extraction is a measure of the model's certainty that its extracted value is correct. Expressed as a float between 0.0 (no confidence) and 1.0 (full confidence), it provides a quantitative signal for downstream decision-making. A confidence score of 0.95 on an invoice number suggests the value can be trusted and pushed directly into an ERP system. A score of 0.4 on a contract renewal date suggests ambiguity — perhaps multiple dates were present, or the wording was unclear — and the value should be flagged for human review.

In LLM-based extraction, confidence scores are generated by the model as part of its structured output. The extraction prompt asks the model to return not just the value but also its confidence as a float, along with the source page number. This is a self-reported estimate rather than a probabilistic calibration — the model is expressing its assessment of extraction difficulty and certainty. While not perfectly calibrated, these scores are empirically useful for triage: values below a chosen threshold (e.g., 0.7) are routed to a review queue, while values above it proceed automatically.

DocumentIQ stores confidence scores on every extracted_rows record and exposes them in the results table UI. Users can sort and filter by confidence to quickly find values that need attention. The feedback mechanism ties directly into this: when a reviewer marks a low-confidence value as correct, that positive signal validates the extraction. When they correct it, the corrected value becomes a ground-truth example that can be injected into future extraction prompts. Over time, this human-in-the-loop cycle drives effective accuracy higher than the raw model confidence alone would suggest, because the system learns from corrections without requiring full model retraining.

Confidence Score

Related Terms

Related Resources

See these concepts in action

More from Algoscale