
Benchmark methodology
How Recensa evaluates multi-model document assurance: principles for whole-document tasks, severity rubrics, and honest partial-run reporting—and an internal harness that applies them across document types. A living framework, not a leaderboard.
Last updated 2026-05-14
How to read this page
Research map
What you can learn here
Task realism
Whole-document behaviors—not toy sentences alone.Rubrics & severity
Material vs nit; meaning-preserving edits.Evidence linkage
Grounded anchors vs invented citations.Disclosure
Failure modes and partial runs—not headline accuracy only.
Technical detail