Going beyond frontier reasoning with Test Time Cognition for LLMs

Voaige is an AI research lab applying computational principles from cognitive and systems neuroscience to LLMs.

A new layer in the agentic stack

Most improvements to AI systems target either the agent layer or the model layer. Test Time Cognition operates between them, modulating inference without touching weights or prompt context.

Agent Layer
prompts · rag · tool use · planning
Test-Time Cognition
adaptive compute · selective search · uncertainty-aware inference
Model Layer
weights · architecture · training · rl

No changes to model weights or agent configuration. Works across closed-source and open-source models alike.

Solving Hard Problems Requires Efficient Search. Neuroscience Teaches Us How.

Reinforcement learning is, at its core, memorized search. A way of compressing the results of exhaustive trial-and-error into policy parameters, so that at inference time, the right action can be retrieved without re-running the search. It works. But it has a fundamental ceiling: the quality of the compressed policy is bounded by the quality of the search that produced it.

Hard problems, genuine reasoning, novel planning, open-ended generalization, cannot be fully compressed this way. They require search at test time, not just at training time. This is why scaling training alone is not enough. But naively scaling test-time compute is not enough either. What is needed is carefully designed inference: systems that know when to search, where to search, and how to do so efficiently.

The brain solved this problem long ago. Biological cognition doesn't brute-force search over exponential spaces; it navigates them through hierarchical abstraction, selective attention, and early pruning, performing structured, resource-constrained search on 20 watts. What makes this possible is an underlying capacity for adaptation: assessing difficulty on the fly, allocating compute where uncertainty is high, scaling back where it is not.

At Voaige, we study these computational principles. Our goal is to understand how the brain conducts search, what algorithmic strategies it employs, and where the real gains come from, then implement principled approximations of those mechanisms using the representational machinery of large language models.

This is what we call Test Time Cognition: search-capable inference that is architecturally grounded in neuroscience, not just scaled compute.

Higher accuracy. Lower cost.

Voaige is a drop-in OpenAI-compatible endpoint. No changes to your model, your agent, or your prompts. Instant accuracy gains, at lower cost per task.

GPT-5.2 baseline Voaige — GPT-5.2
Success Rate (%)
GPT-5.2 minimal 32.6%, low 49.4%, medium 57.7%, Voaige GPT-5.2 64.8%.
Median Task Cost ($)
GPT-5.2 minimal $0.09, low $0.21, medium $0.45, Voaige GPT-5.2 $0.26.

Mini-SWE-agent · Terminal-Bench 2.0 · April 2026.

Recent Research

From Voaige Research