Evalatro: an open benchmark where LLMs play real Balatro

Evalatro is an open benchmark that allows LLMs to play the actual game Balatro. Models receive game state as text, make decisions independently, and compete to reach Ante 12, with current results showing limited progress—mimo-v2.5-pro reached Ante 5, and deepseek-v4-pro failed to beat Ante 8.