Quaerens AI Labs Vol. I · Experiment record · MMXXVI
Quaerens
QA Arena Inquirer Knowledge Graph Methodology Work with us →
← Grounded Document QA · flagship envelope

FLAGSHIP ENVELOPE

Flagship Dim Budget

Dimension tested · budget & scale

§ 01Arena measurement

Arena question

Does increasing candidate budget improve grounded-QA output quality, or does it mainly increase yield and selection headroom?

Lanes

budget_100
candidates  100
selected  100
budget_200
candidates  200
selected  200
budget_50
candidates  50
selected  50

Primary metrics

evidence_extractiongroundinganswerability

Secondary metrics: faithfulness, answer_relevancy, selected_count, duplicate rejection

Observed per-lane means

metricbudget_100budget_200budget_50spread
evidence_extraction0.9080.9140.9010.0124
grounding0.9480.9540.9420.0124
answerability0.9370.9440.9320.0119

Best lane per metric shown in oxblood. Spread = max − min across lanes.

§ 02Interpretation

Raising the candidate budget from 50 to 100 to 200 increased the available selected set without causing an obvious quality collapse.

This dimension tested volume, not a new source or model strategy. The run held the source package and grounded-QA envelope fixed while changing how many candidates were produced and kept. All three arms completed at their planned sizes, and evaluation remained discriminating across the larger set.

The result is useful for planning scale: a larger budget gives more selection headroom and more examples to curate. It does not yet prove that more budget automatically produces better QA pairs. The public claim should stay focused on yield and operational capacity, with quality-lift treated as a follow-up question.

complete
§ 03Limits & next step

Score distributions are close across arms. The audit closes this as a volume/yield result — larger budget produced more kept rows without an obvious quality lift or collapse, not a strong quality-separation result.

§ 04Planning

Experiment spec

Tools
agent_skill
Budget
Not specified
Judge
taxonomy_agent (agent_skill)
Source: NIST SP 800-53
§ 05Envelope & scores

QA envelope

Use case
flagship:grounded-document-qa:candidate_budget
Artifact contract
answer.qa_pair.grounded.v1
Metric set
answer.grounded.v1
Operator
nico
Models
agent_internal

Scores

metricmeanrangen
grounding0.9510.885–0.995350
faithfulness0.9450.873–0.994350
evidence_extraction0.9100.850–0.953350
answerability0.9400.886–0.985350
answer_relevancy0.9350.863–0.984350
Quaerens
Evidence over claims · scoped over global
© 2026 Quaerens AI Labs / Mario Lanzillotta