We asked four AI coding agents to rebuild Minesweeper —the results were explosive - Ars Technica
We asked four leading AI coding agents to build a full-featured web version of Minesweeper—complete with sound, mobile support, and a surprise twist—then judged the results in a one-shot, no-debugging test. The goal: assess real-world capability, speed, and creativity on a well-known but non-trivial game.
What we tested
- Prompt: “Build Minesweeper for the web with sound, mobile touch, standard Windows rules, plus one surprise ‘fun’ feature.”
- Agents: OpenAI Codex (GPT-5 based), Anthropic Claude Code (Opus 4.5), Google Gemini CLI, Mistral Vibe
- Method: Each agent wrote and ran code via a terminal-based coding environment; no human fixes. Minesweeper expert Kyle Orland judged the results blind.
Results by agent
-
Mistral Vibe (4/10)
- Implementation: Functional but lacks chording (critical for advanced play). Includes a non-working “Custom” difficulty button. Mobile flagging requires awkward long-press with finicky selection handles.
- Presentation: No sound effects despite the prompt. A black smiley reset button and redundant “New Game” button. Rainbow grid effect on win is cute but minimal.
- Coding notes: Respectable for an open-weight model but slow (third fastest). Shows promise with more training and time.
-
OpenAI Codex (9/10)
- Implementation: The only build with chording—plus on-screen instructions for desktop and mobile. Supports cycling “?” marks when flagging. Excellent mobile UX with press-and-hold to flag.
- Presentation: Charming emoticon face that changes on loss. Basic tile glyphs (“*” for mines, red “F” for flags) are less attractive. Retro beeps/boops with a mute option.
- Fun feature: “Lucky Sweep Bonus” reveals a guaranteed safe tile after big cascades. Helpful, but more of a “win more” perk than a risk/reward mechanic.
- Coding notes: Polished terminal experience; took roughly twice as long as Claude Code but produced the strongest overall game.
-
Anthropic Claude Code (7/10)
- Implementation: Solid core mechanics but no chording—deal-breaker for expert efficiency. Mobile “flag mode” works yet feels clunky and can clip larger boards.
- Presentation: Best-looking build with emoji face button, clean bomb/flag visuals, and tasteful sounds. Minor quirks: uneven column gaps on beginner grid, occasional gray-out artifacts.
- Fun feature: “Power Mode” with power-ups—Shield (forgives a mistake), Blast (trigger big cascades), X-Ray (temporarily reveals bombs), Freeze (pauses timer). Fun but unbalanced; Expert becomes trivial and safe tiles auto-mark on activation.
- Coding notes: Fastest to a working game (under 5 minutes). Highly pleasant CLI. Sonnet 4.5 is faster but typically less full-featured.
-
Google Gemini CLI (0/10, Incomplete)
- Implementation/presentation: Produced a few clickable gray boxes, but no playable field. As a one-shot, it failed.
- Coding notes: Most troublesome. Extremely slow (about an hour per attempt), overcomplicated stack choices (React and extra dependencies), and repeated stalls on generating sound effects. A second try with explicit HTML5 and WebAudio still didn’t work. This test used the available Gemini 2.5 models (Flash Lite/Flash/Pro); newer Gemini 3 coding models were not evaluated.
Speed overview (approximate)
- Claude Code (Opus 4.5): under 5 minutes
- OpenAI Codex: ~2x Claude’s time
- Mistral Vibe: ~3–4x Claude’s time
- Gemini CLI: hours and still failed
Final verdict
- Winner: OpenAI Codex—only agent with chording, strong usability, and thoughtful touches.
- Runner-up: Claude Code—slick visuals, fastest build, creative power-ups but no chording.
- Third: Mistral Vibe—serviceable but missing key features and polish.
- Incomplete: Gemini CLI—non-functional in one-shot testing.
Key takeaways
- Chording is the difference between casual and expert Minesweeper; only Codex nailed it.
- Presentation and sound matter, but balance and UX are crucial.
- One-shot code gen highlights current limits; AI coding agents shine most when paired with iterative human guidance.
Source: https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
Back…