Notes on Microsoft's FastContext, and a small SWE-QA experiment with retrieval hints

The author analyzes Microsoft's FastContext paper and presents an alternative approach using offline semantic search to reduce token usage in coding agents. By indexing repositories beforehand and providing file-range hints to Claude Code, the method achieved a 43.8% drop in total tokens while maintaining equivalent solution quality on SWE-QA.

FastContext separates repository exploration from solving using a trained subagent, whereas this experiment uses Attemory for offline indexing and retrieval.
The test covered 720 paired samples across 15 repositories, comparing a baseline of Claude Code with DeepSeek v4 against the same setup with retrieval hints.
Total token usage decreased by 43.8%, with reductions in both main-agent and subagent tokens, while the GPT-5.4 judge score remained essentially unchanged (83.39 vs 83.17).
Unlike FastContext's online agent loop, Attemory uses prefill-only retrieval, avoiding token-by-token decoding for the exploration phase.

This approach demonstrates that simple offline retrieval hints can significantly lower context costs without requiring a specialized trained explorer, offering a lighter-weight alternative for coding agents.