Providers and research ideas become strategy arms only when they fit a QA envelope. A component is useful only inside a specific use case — not in general.
Provider tools we have profiled as possible strategy components. None is endorsed or ranked above the others.
Meta's open-source QA-generation toolkit. Direct, summary-first, and curated modes.
Open-source pipeline focused on long-document factual QA with span-level provenance.
Evaluation framework with built-in synthetic QA generation.
Open-source framework for synthetic data pipelines, including QA.
Argilla's library for distillation and synthetic data generation.
Open-source fine-tuning data toolkit with a QA-generation flow.
Strategies extracted from papers and translated into experiment arms.
Pair each candidate QA with multi-source evidence references; reject candidates whose claims are not supported by the cited spans.
A lightweight judge model paired with structured rubric prompts to score grounded QAs at a fraction of the cost of a frontier judge.
A component becomes useful only when it is tested inside a concrete use case. QA Arena does not ask whether a provider or paper is good in general. It asks whether that component improves a specific QA envelope.