Out the Box
Benchmarks

Useful local AI starts with honest numbers.

These charts are the benchmark source for this page: Artificial Analysis screenshots dated May 30, 2026. No outside benchmark data is mixed in.

Top score shown

45.8

The leading AI system in the index screenshot.

Visible range

8.5-45.8

The spread across the AI systems shown in the index chart.

Answer span

3M-144M

Answer volume shown in the supporting chart.

Higher is better

The index and evaluation charts rank how well different AI systems perform on the test mix.

Answer volume is context

The answer-volume chart is not an intelligence score. It shows the amount of generated text behind the index.

Local has limits

Smaller local AI can be useful and private while still sitting below the largest cloud systems on hard tests.

Overall comparison

Artificial Analysis Intelligence Index

A top-line score across the Artificial Analysis evaluation mix. Higher is better, and the spread shows why Out the Box stays honest about what smaller local AI can and cannot replace.

Artificial Analysis Intelligence Index bar chart dated May 30 2026 showing AI system scores from 45.8 down to 8.5.
Answer volume

Answer text produced during the index

This chart shows how much answer text was produced during the evaluation. It is useful context when comparing practical effort across larger and smaller systems.

Output text volume used to run Artificial Analysis Intelligence Index bar chart dated May 30 2026.
Per-test detail

Intelligence evaluations

The detailed panels break the index into individual tests, including planning, coding, long documents, knowledge, scientific reasoning, instruction following, and image reasoning.

Artificial Analysis intelligence evaluations grid dated May 30 2026 with individual benchmark charts.
The product takeaway

Out the Box is not pretending a desk box is the largest cloud system.

The benchmark story is straightforward: bigger systems win the hardest tests, but local AI is still useful for private drafts, summaries, search, and focused work where ownership matters.

That is why the product promise is local by default, clear about outside help, and honest about the limits before you buy.