A self-hosted Qwen3.6-27B model with identical prompt and hardware generated four different HTML/JavaScript solar system simulations. The agent scaffolding significantly influenced output: opencode produced clean, stable code with accurate physics; pi showed robustness and coordinate consistency; hermes offered visually appealing but physically flawed results; qwen code generated minimal, crude code. The results highlight how agent design shapes code quality, correctness, and stability despite shared model and prompt.
Same model, same prompt, 4 different agents produce varied code quality
from English