The author analyzes Microsoft's FastContext paper and presents an alternative approach using offline semantic search to reduce token usage in coding agents. By indexing repositories beforehand and providing file-range hints to Claude Code, the method achieved a 43.8% drop in total tokens while maintaining equivalent solution quality on SWE-QA.
- FastContext separates repository exploration from solving using a trained subagent, whereas this experiment uses Attemory for offline indexing and retrieval.
- The test covered 720 paired samples across 15 repositories, comparing a baseline of Claude Code with DeepSeek v4 against the same setup with retrieval hints.
- Total token usage decreased by 43.8%, with reductions in both main-agent and subagent tokens, while the GPT-5.4 judge score remained essentially unchanged (83.39 vs 83.17).
- Unlike FastContext's online agent loop, Attemory uses prefill-only retrieval, avoiding token-by-token decoding for the exploration phase.
This approach demonstrates that simple offline retrieval hints can significantly lower context costs without requiring a specialized trained explorer, offering a lighter-weight alternative for coding agents.