JAMER introduces JamSet and JamBench, the first project-level game code dataset and benchmark on a professional game engine. Built from 8,133 verified Game Jam projects, it enables deterministic evaluation and reveals a capability cliff in AI models as project scale increases, with runtime pass rates dropping from 80.4% to 5.7%.
JAMER: Project-Level Code Framework Dataset and Benchmark
from English