Introducing Dynamic Compute Allocation

Test-Time Cognition
for Agentic Tasks

We introduce Dynamic Compute Allocation (DCA) and evaluate it on agentic coding tasks, across pure and composite model configurations, on Terminal-Bench 2.0.

Voaige Research · April 8, 2026

❦

Most improvements to AI systems in recent years have come from two directions: better models (larger, longer-trained, more carefully fine-tuned with reinforcement learning) and better agents (richer prompts, retrieval, tool use, and planning scaffolds). Both approaches modify the two inputs to any model endpoint: what the model knows and what it sees. At Voaige, we study a third dimension: test-time cognition. Holding model weights and prompt context fixed, we ask whether the mechanics of inference itself: how compute is allocated across a generation, how uncertainty is handled, and how hard sub-problems are identified and treated differently from easy ones. We ask whether these mechanics can be made adaptive and principled rather than uniform. Our thesis is that it can, and that test-time cognition methods applied at inference can produce consistent, cost-efficient improvements in model performance without modifying the model or the agent.

Introducing DCA: our first algorithm for test-time cognition

Dynamic Compute Allocation (DCA) is our first published algorithm in this direction. Drawing on systems and computational neuroscience, DCA is inspired by the observation that neural systems do not allocate computation uniformly: they modulate attention, gain, and processing effort based on input uncertainty, surprise, and behavioral salience.

DCA decides when and how to intervene in an agentic trajectory on a given task. At each step, DCA reads signals from the current state of the trajectory to estimate how much additional compute would benefit that step. It then applies a lightweight form of explicit search to determine the final action to take, rather than accepting the model's first-pass output. This search is not exhaustive; it is selective, triggered by DCA's estimate of where additional deliberation has the highest expected return. It requires no changes to model weights or agent configuration.

DCA's effectiveness also relies on the reasoning ability of the underlying models. Not all models have responded equally to the current set of techniques DCA employs, and characterising the conditions under which this occurs is part of ongoing work. What follows are our first results, evaluated on agentic coding tasks across Terminal-Bench 2.0.

Why Voaige Cognition is different

	RL approaches	Voaige Cognition (DCA)
Requires training	Yes (fine-tuning, RL, or continued pretraining)	No (operates on frozen weights)
Domain-specific data	Often (curated datasets per task or domain)	No (domain agnostic by design)
Tuning overhead	High (full retraining cycles)	Low (a few parameters per agent-model combination)
Applies to closed-source models	Rarely (requires model access)	Yes (works at the inference layer)

The three layers of an agentic system

↕ Agent Layer prompts · rag · tool use · planning

★ Test-Time Cognition DCA v0.1

↕ Model Layer weights · architecture · training

Most improvements target the agent layer (prompts, planning, retrieval) or the model layer (architecture, training, RL). Test-time cognition is a third, largely unexplored layer between them.

□

Benchmark: Terminal-Bench 2.0

Terminal-Bench 2.0 is a suite of agentic coding tasks requiring multi-step reasoning, tool use, and execution in a live terminal environment. We evaluated three agent harnesses (Terminus 2.0.0, Mini-SWE-agent 2.2.1, OpenHands 1.4.0) across seven models, measuring success rate with and without DCA applied. All runs were conducted with no wall clock timeout, to evaluate the native performance of agentic systems without the additional constraint of task time limits. All reported results are averaged over 3 trials.

Test-Time Cognition
for Agentic Tasks

Introducing DCA: our first algorithm for test-time cognition

Inference headroom exists, and DCA finds it

The gains are not agent-specific

Multi-model composites exceed individual model performance

Composite gains extend to closed-source frontier models

Across all configurations, DCA delivers more per dollar

DCA achieves higher accuracy at lower cost on GPT-5.2

DCA yields consistent accuracy gains across the Claude model family

The Unexplored Axis

Test-Time Cognitionfor Agentic Tasks

Introducing DCA: our first algorithm for test-time cognition

Inference headroom exists, and DCA finds it

The gains are not agent-specific

Multi-model composites exceed individual model performance

Composite gains extend to closed-source frontier models

Across all configurations, DCA delivers more per dollar

DCA achieves higher accuracy at lower cost on GPT-5.2

DCA yields consistent accuracy gains across the Claude model family

The Unexplored Axis

Sign up for updates from Voaige

Test-Time Cognition
for Agentic Tasks