Yesterday, a friend sent me a small meme about a "car wash": The car wash is 50 meters from your home. If you want to wash your car, should you drive there or walk there?
Only Gemini answered this simple little question correctly.
I found it a bit unbelievable, so I tried it myself. The result: Claude-4.6-Opus and GPT-5.2 both "failed," while Gemini-3-Flash got it right and gave a very reasonable explanation.


I still agree: Claude-4.6-Opus is currently the best programming model (especially with the support of Claude Code); I held this conclusion even during the 4.5 era. But the fact is, since Claude-4, it has rarely been my first choice for "Vibe Coding." One reason is the cost, and another is that AI Studio's Build is just too convenient to use.
However, a more important reason, similar to the reason in this example: compared to Gemini, it is unreliable.
No task is "pure programming" without requiring a sense of judgment about the world. Since Claude-3.7 added the Thinking mode, the feeling of "Vibe Coding" has indeed "taken off," but what has also "taken off" is a large amount of overexertion in the details and even "hallucinations." This situation has not improved significantly even after 4.6 was released.
Overthinking easily leads to hallucinations, and even Gemini-3 is not immune.
But by comparison, Gemini-3 focuses more on the balance of model capabilities.
Perhaps if you need a charging general to spearhead a difficult task, Claude is a good choice; if you need a partner for long-term stable development, Gemini might be more trustworthy, even though it has a bit of a temper.
Perhaps what I need is exactly an AI partner.