A Reddit user asks for recommendations on optimizing context space and computational efficiency for running a local large language model. The poster is using a Qwen 3.6 27B model quantized to Q4 on an NVIDIA RTX 3090 with 24GB of VRAM.
- The user reports a total context window of approximately 34,000 tokens.
- A custom memory system utilizing HDBSCAN and a diary routine consumes about 24,000 tokens upon startup.
- Attempting to expand the context window using system RAM results in significantly slower performance.
- The user's primary goal is local coding assistance but faces constraints due to limited hardware resources.