We asked four AI coding agents to rebuild Minesweeper —the results were explosive - Ars Technica

December 19, 2025 at 5:30 PM

We asked four leading AI coding agents to build a full-featured web version of Minesweeper—complete with sound, mobile support, and a surprise twist—then judged the results in a one-shot, no-debugging test. The goal: assess real-world capability, speed, and creativity on a well-known but non-trivial game.

What we tested

Prompt: “Build Minesweeper for the web with sound, mobile touch, standard Windows rules, plus one surprise ‘fun’ feature.”
Agents: OpenAI Codex (GPT-5 based), Anthropic Claude Code (Opus 4.5), Google Gemini CLI, Mistral Vibe
Method: Each agent wrote and ran code via a terminal-based coding environment; no human fixes. Minesweeper expert Kyle Orland judged the results blind.

Results by agent

Mistral Vibe (4/10)
- Implementation: Functional but lacks chording (critical for advanced play). Includes a non-working “Custom” difficulty button. Mobile flagging requires awkward long-press with finicky selection handles.
- Presentation: No sound effects despite the prompt. A black smiley reset button and redundant “New Game” button. Rainbow grid effect on win is cute but minimal.
- Coding notes: Respectable for an open-weight model but slow (third fastest). Shows promise with more training and time.
OpenAI Codex (9/10)
- Implementation: The only build with chording—plus on-screen instructions for desktop and mobile. Supports cycling “?” marks when flagging. Excellent mobile UX with press-and-hold to flag.
- Presentation: Charming emoticon face that changes on loss. Basic tile glyphs (“*” for mines, red “F” for flags) are less attractive. Retro beeps/boops with a mute option.
- Fun feature: “Lucky Sweep Bonus” reveals a guaranteed safe tile after big cascades. Helpful, but more of a “win more” perk than a risk/reward mechanic.
- Coding notes: Polished terminal experience; took roughly twice as long as Claude Code but produced the strongest overall game.
Anthropic Claude Code (7/10)
- Implementation: Solid core mechanics but no chording—deal-breaker for expert efficiency. Mobile “flag mode” works yet feels clunky and can clip larger boards.
- Presentation: Best-looking build with emoji face button, clean bomb/flag visuals, and tasteful sounds. Minor quirks: uneven column gaps on beginner grid, occasional gray-out artifacts.
- Fun feature: “Power Mode” with power-ups—Shield (forgives a mistake), Blast (trigger big cascades), X-Ray (temporarily reveals bombs), Freeze (pauses timer). Fun but unbalanced; Expert becomes trivial and safe tiles auto-mark on activation.
- Coding notes: Fastest to a working game (under 5 minutes). Highly pleasant CLI. Sonnet 4.5 is faster but typically less full-featured.
Google Gemini CLI (0/10, Incomplete)
- Implementation/presentation: Produced a few clickable gray boxes, but no playable field. As a one-shot, it failed.
- Coding notes: Most troublesome. Extremely slow (about an hour per attempt), overcomplicated stack choices (React and extra dependencies), and repeated stalls on generating sound effects. A second try with explicit HTML5 and WebAudio still didn’t work. This test used the available Gemini 2.5 models (Flash Lite/Flash/Pro); newer Gemini 3 coding models were not evaluated.

Speed overview (approximate)

Claude Code (Opus 4.5): under 5 minutes
OpenAI Codex: ~2x Claude’s time
Mistral Vibe: ~3–4x Claude’s time
Gemini CLI: hours and still failed

Final verdict

Winner: OpenAI Codex—only agent with chording, strong usability, and thoughtful touches.
Runner-up: Claude Code—slick visuals, fastest build, creative power-ups but no chording.
Third: Mistral Vibe—serviceable but missing key features and polish.
Incomplete: Gemini CLI—non-functional in one-shot testing.

Key takeaways

Chording is the difference between casual and expert Minesweeper; only Codex nailed it.
Presentation and sound matter, but balance and UX are crucial.
One-shot code gen highlights current limits; AI coding agents shine most when paired with iterative human guidance.

Source: https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/

Back…

We asked four AI coding agents to rebuild Minesweeper —the results were explosive - Ars Technica

We use cookies!

Use of Cookies