Why Small Is the Design, Not the Limit
Thousand Token Wood is a tiny economy built for the Build Small Hackathon. It features five woodland creatures, each powered by a Qwen2.5-3B agent. They trade five goods, gossip, hoard resources, and occasionally panic. Users interact by poking the wood and watching bubbles, crashes, and a widening wealth gap emerge naturally. The model runs on vLLM on Modal with a Gradio app as the user interface.
A living economy requires many agents making many decisions per run. A frontier model would be too slow and too costly to run a council of traders every tick. A small model makes real-time multi-agent simulation feasible. Every creature decides in a single batched GPU call per turn.
The First Economy Was Dead on Arrival
The initial version did nothing. Production matched consumption exactly, so every creature was self-sufficient and never needed to trade. The market cleared once and went silent. The fix was to engineer scarcity through three mechanics:
- Diet variety: a creature can eat only one unit of any single food per meal. Surviving means buying foods it does not grow itself.
- Spoilage: perishable food rots if hoarded. Surplus must be sold while it still has value.
- A winter fuel crisis: every creature must burn firewood each turn, demand rises over time, and only one creature makes firewood.
The last mechanic drives most of the drama. One supplier cannot meet rising demand. The woodcutter gets rich while everyone else competes for warmth.
Valid JSON, Weak Judgment
With scarcity in place, the honest small model lesson surfaced. The 3B model produced valid JSON on 100% of calls, but its economic judgment was poor. A creature that produced acorns would post an order to buy acorns, the one thing it had in surplus.
The fix was not a larger model. It was a sharper prompt. The developer told each agent what it produced and must never buy. The prompt computed the exact list of goods the agent was short on and included one worked example. Decision quality jumped and the creatures began trading according to their roles. The whole loop is wrapped in a tolerant JSON parse and repair layer. A malformed response degrades to a no-op instead of crashing the simulation.
A second lesson came from wellbeing. The developer first modeled it as an accumulator. Any chronic shortfall ground every creature to zero over a run, creating a death spiral that was no fun to watch and punished imperfect optimization. The fix was to reframe it as a mean reverting mood that recovers when a creature is fed and warm and never hits zero. Stakes belong in pebbles, prices, and status, not starvation.
Then It Started Telling Stories
Stay updated
Get the day's AI and automation news in your inbox. No spam, unsubscribe anytime.
The feature the developer is most pleased with ties the project to market history. The player can draw a Wood Legend: a famous episode reskinned as woodland folklore. Tulip Mania becomes the Great Acorn Mania. The South Sea Bubble becomes the Hollow Log Trading Company. The 1929 bank runs become the Run on Oona's Hoard.
These are not flavor text. Each legend fires real shocks and the agents react. In one run the developer drew the Run on Oona's Hoard, the rumor that the owl's vault was empty. Oona began liquidating her honey to raise pebbles. The flood of supply crashed the honey price from 10 to 3 over the next turns. A reskinned bank run made an agent dump assets and moved a market price. None of it was scripted.
For that to be visible, prices had to move. They were frozen because the agents quoted back the reference price shown to them. The fix was to let the market reference drift with residual supply and demand after each round. Heavy unfilled buying pushes a price up. A glut pushes it down. Prices now trend during scarcity and stay calm in balanced trade.
What Actually Happened
A representative fifteen turn run, with a drought and a winter rumor injected partway, produced the following results:
- Valid JSON actions: 100% (75 of 75 calls)
- Trades per turn: sustained 3 to 9, never silent
- Honey price: crashed from 10 to 3 during the bank run legend
- Firewood price: rose from 4 to 7 as winter scarcity bit
- Wealth gap (Gini): widened from 0.14 to 0.38
- Outcome: the woodcutter ended richest, the hoarder broke
The reasoning behind every move is available in the open traces dataset. Each row contains a creature's full prompt, raw response, parsed actions, and private thought.
Takeaways for Building with Small Models
Most of the engineering involved closing the gap between a small model's reliable formatting and its unreliable reasoning. This was done with structure and prompting rather than scale. Emergent systems need designed scarcity; abundance is boring. The most compelling small model demos do not need invented drama. Three centuries of market history had it ready, and a council of 3B agents was enough to play it out.
Small models can lead to big adventures.
Related on Neura Market:
