Voaige - Test Time Cognition for LLMs

Where Test Time Cognition lives

A new layer in the agentic stack

Most improvements to AI systems target either the agent layer or the model layer. Test Time Cognition operates between them, modulating inference without touching weights or prompt context.

↕

Agent Layer

prompts · rag · tool use · planning

★

Test-Time Cognition

adaptive compute · selective search · uncertainty-aware inference

↕

Model Layer

weights · architecture · training · rl

No changes to model weights or agent configuration. Works across closed-source and open-source models alike.

Solving Hard Problems Requires Efficient Search. Neuroscience Teaches Us How.

Reinforcement learning is, at its core, memorized search. A way of compressing the results of exhaustive trial-and-error into policy parameters, so that at inference time, the right action can be retrieved without re-running the search. It works. But it has a fundamental ceiling: the quality of the compressed policy is bounded by the quality of the search that produced it.

Hard problems, genuine reasoning, novel planning, open-ended generalization, cannot be fully compressed this way. They require search at test time, not just at training time. This is why scaling training alone is not enough. But naively scaling test-time compute is not enough either. What is needed is carefully designed inference: systems that know when to search, where to search, and how to do so efficiently.

The brain solved this problem long ago. Biological cognition doesn't brute-force search over exponential spaces; it navigates them through hierarchical abstraction, selective attention, and early pruning, performing structured, resource-constrained search on 20 watts. What makes this possible is an underlying capacity for adaptation: assessing difficulty on the fly, allocating compute where uncertainty is high, scaling back where it is not.

At Voaige, we study these computational principles. Our goal is to understand how the brain conducts search, what algorithmic strategies it employs, and where the real gains come from, then implement principled approximations of those mechanisms using the representational machinery of large language models.

This is what we call Test Time Cognition: search-capable inference that is architecturally grounded in neuroscience, not just scaled compute.

Why Test-Time Cognition? How We Do Research

Higher accuracy. Lower cost.

Voaige is a drop-in OpenAI-compatible endpoint. No changes to your model, your agent, or your prompts. Instant accuracy gains, at lower cost per task.

GPT-5.2 baseline Voaige — GPT-5.2

Success Rate (%)

Median Task Cost ($)

Mini-SWE-agent · Terminal-Bench 2.0 · April 2026.

Read full results →

Going beyond frontier reasoning with Test Time Cognition for LLMs

A new layer in the agentic stack

Higher accuracy. Lower cost.

From Voaige Research

Test-Time Cognition
for Agentic Tasks

The Third Axis of
Intelligence

Why Test-Time
Cognition?

Going beyond frontier reasoning with Test Time Cognition for LLMs

A new layer in the agentic stack

Higher accuracy. Lower cost.

From Voaige Research

Test-Time Cognitionfor Agentic Tasks

The Third Axis ofIntelligence

Why Test-TimeCognition?

Follow our work on Test Time Cognition.

Test-Time Cognition
for Agentic Tasks

The Third Axis of
Intelligence

Why Test-Time
Cognition?