A test on an RTX 5060 Ti showed that reducing a local AI voice assistant's model size from 9B to 0.8B leads to a sharp decline in capability. The 9B model handles tool orchestration well, while smaller models show increasing failures: the 4B model skips tool calls and guesses facts, the 2B model suffers semantic drift, and the 0.8B model fails to operate agent functions, triggering wrong APIs or infinite loops.
Watching a Local AI Voice Assistant Get Dumber
from English