All articles
media r/LocalLLaMA · 12d ago

Help Running Local Hermes Agent with llama-cpp

A user reports issues running a local Hermes AI agent on a high-end rig using self-compiled llama-cpp. The setup experiences frequent KV cache reprocessing every 5 messages and slow reasoning, with the agent repeatedly pausing to report progress instead of continuing autonomously. The user seeks guidance on whether their llama-cpp parameters are incorrect or what adjustments can improve agent performance and sustained reasoning without interruptions.

media r/LocalLLaMA · 12d ago

Fixing Long-Context Decode Cliff on Radeon R9700 with vLLM 0.22.1

A long-context decode performance cliff on AMD Radeon AI PRO R9700 (RDNA4) was resolved by enabling AITER Unified Attention in vLLM 0.22.1. The fix involves relaxing a CDNA gate to include RDNA4, disabling other attention backends, and using bf16 KV cache, resulting in significant speedups across all context lengths. FP8 KV is ineffective on this hardware, and the model's native 262K context is fully achievable with bf16, offering ~2.9× concurrency without needing FP8.

media Don't Worry About the Vase · 12d ago

Claude Fable 5 and Mythos 5: Capabilities

Anthropic launched Claude Fable 5, a Mythos-class model claiming state-of-the-art performance across software engineering, scientific research, and knowledge work. It was quickly taken down by the U.S. government after a jailbreak was reported, though Anthropic asserts it is now available again, with Fable 5 showing exceptional capabilities and a more nuanced, thoughtful reasoning style compared to prior models.

media r/LocalLLaMA · 12d ago

Adding a Second GPU to X670E Motherboard for Local LLMs

A user wants to add a second 16GB VRAM GPU (5060 Ti or 5070 Ti) to their MSI X670E Tomahawk WiFi motherboard for running large local LLMs like Qwen 3.6 27B. The current setup lacks space for a second GPU due to the primary 5070 Ti occupying the second PCIe slot, leaving only the third slot partially available. The user seeks advice on feasible options—such as using the fourth PCIe slot or a riser—while considering cooling, stability, and physical fit, especially with a horizontal GPU mount like the Lian Li VG4v4.