A user details a fully local web research pipeline for AI agents that avoids all cloud API calls. The architecture layers self-hosted SearXNG for search, the Hister cache/index layer, rnet (now wreq) for TLS-fingerprinted HTTP fetches, and Camoufox as a headless browser fallback.
- SearXNG handles initial search queries locally.
- Hister stores every fetched page to ensure repeat lookups are instant and preserve content even if pages change or disappear.
- rnet (now wreq) bypasses basic anti-bot measures using TLS fingerprinting.
- Camoufox renders JS-heavy pages that require full browser interaction.
- A local qwen3-reranker-4b scores relevance, with all components communicating via an MCP server.
The cache layer is highlighted as the most valuable component for maintaining access to original content. The entire stack runs on a single box alongside inference models without external dependencies.