Useful local AI starts with honest numbers.
These charts are the benchmark source for this page: Artificial Analysis screenshots dated May 30, 2026. No outside benchmark data is mixed in.
45.8
The leading AI system in the index screenshot.
8.5-45.8
The spread across the AI systems shown in the index chart.
3M-144M
Answer volume shown in the supporting chart.
Higher is better
The index and evaluation charts rank how well different AI systems perform on the test mix.
Answer volume is context
The answer-volume chart is not an intelligence score. It shows the amount of generated text behind the index.
Local has limits
Smaller local AI can be useful and private while still sitting below the largest cloud systems on hard tests.
Artificial Analysis Intelligence Index
A top-line score across the Artificial Analysis evaluation mix. Higher is better, and the spread shows why Out the Box stays honest about what smaller local AI can and cannot replace.

Answer text produced during the index
This chart shows how much answer text was produced during the evaluation. It is useful context when comparing practical effort across larger and smaller systems.

Intelligence evaluations
The detailed panels break the index into individual tests, including planning, coding, long documents, knowledge, scientific reasoning, instruction following, and image reasoning.

Out the Box is not pretending a desk box is the largest cloud system.
The benchmark story is straightforward: bigger systems win the hardest tests, but local AI is still useful for private drafts, summaries, search, and focused work where ownership matters.
That is why the product promise is local by default, clear about outside help, and honest about the limits before you buy.