A new fork of ik_llama.cpp adds a --numa mirror mode that duplicates model weights and KV cache across CPU sockets, enabling full utilization of multi-socket systems. This reduces remote memory access penalties and improves inference throughput by up to 1.6x on tested models, though it requires twice the RAM.
I forked ik_llama.cpp and added --numa mirror mode
from English