A developer has released a free, simple Retrieval-Augmented Generation (RAG) API powered by medical Wikipedia articles to provide local large language models with accurate factual information. The service aims for subsecond responses and currently runs on a single ARM VPS using approximately 2GB of RAM.

  • The API supports the Model Context Protocol (MCP) for easy integration with AI agents.
  • It allows users to instruct their LLMs to fetch medical facts directly from the source rather than relying solely on model weights.
  • A demonstration shows a small model (qwen3.5-0.8B) hallucinating incorrect cardiac details about the Lhermitte sign without RAG, while correctly identifying it as a neurological symptom when using the API.

This tool helps mitigate hallucinations in local LLMs by providing them with verified medical data they may not have memorized.