media r/LocalLLaMA · 8d ago · open_models

Minimax M3 (4-bit MLX) Initial Benchmark on Mac Studio M3 with 512GB

from English

The Minimax M3 (4-bit MLX) was benchmarked on a Mac Studio M3 with 512GB storage. Results show token throughput and latency metrics across different prompt sizes, with peak performance at 269.1 tok/s for 8192-token prompts and 172.8 tok/s for a 65k-token prompt, using 228GB of peak memory.

Importance 1/3 r/LocalLLaMA Benchmark results Code generation Inference efficiency

Read original