How I'm using local models from real-world coding
The author shares a practical setup for using local large language models on modest hardware, specifically a laptop with 32GB of RAM and an NVIDIA RTX 4070 with 8GB VRAM. The core strategy involves running the Qwen3.6-35B-A3B model locally as a 'small coding agent' while offloading complex planning to a cloud-based GLM 5.2 instance.