User seeks advice on maximizing context window for local LLM coding

A Reddit user asks for recommendations on optimizing context space and computational efficiency for running a local large language model. The poster is using a Qwen 3.6 27B model quantized to Q4 on an NVIDIA RTX 3090 with 24GB of VRAM.

The user reports a total context window of approximately 34,000 tokens.
A custom memory system utilizing HDBSCAN and a diary routine consumes about 24,000 tokens upon startup.
Attempting to expand the context window using system RAM results in significantly slower performance.
The user's primary goal is local coding assistance but faces constraints due to limited hardware resources.