Has anyone else found vLLM outputs worse than llama.cpp?
A user reports noticing less reliable outputs from vLLM compared to llama.cpp, including formatting errors, context forgetting, and lower code quality. They ask whether such differences stem from quantization, chat templates, parser issues, or configuration errors, and seek confirmation if others have observed similar quality discrepancies between inference backends.