Benchmarks using fixed-shape checks miss real bugs in LLM-generated GPU kernels. A controlled corpus of 24 kernels, including 9 buggy variants with transcription errors, reveals that an op-schema-aware oracle detects all failures and passes all correct controls, with identical results across five GPU architectures.