A local LLM run on 8-16 MI50 GPUs achieves up to 19 tokens per second (TPS) peak throughput for the Minimax M3 model. Performance is limited by long reasoning outputs and code quality, with speculative decoding showing 50% acceptance rate and high latency, indicating usability challenges for agentic coding tasks.