Retrieval & RAG — korshunov.ai

Retrieval & RAG Page 1 / 3

LLM Medical Scribing Benchmark: Omissions Outnumber Hallucinations

A benchmark of 8 LLMs on 300 synthetic doctor-patient dialogues found 12 high-impact hallucinations and 520 clinically relevant omissions. Omissions were far more common than hallucinations, with DeepSeek excelling in prose and cost but missing many safety facts, while Claude Opus had fewest omissions but poorer prose quality.

media r/LocalLLaMA · 2d ago

Comparing Docling, Liteparse, MinerU, and Unstructured for On-Prem Document Processing

A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.

media r/LocalLLaMA · 2d ago

Why is Gemma 4 26b not mentioned more?

Users note a lack of discussion around Gemma 4 26b despite its potential suitability for personal assistant and RAG tasks on a solo 3090. The model is considered a strong candidate for all-in-one local AI applications, though it receives less attention compared to Qwen3.6 or Gemma4 31b.

lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.

arxiv arXiv cs.CL · 2d ago

ViRGo: Adaptive Routing for Visual Retrieval and Global Perception

ViRGo introduces a lightweight framework that adapts visual retrieval based on object scale. It uses intrinsic localization and semantic confidence to route between global perception, patch-based retrieval, and attention-based retrieval, improving accuracy-efficiency trade-offs without extra computation.

arxiv arXiv cs.CL · 2d ago

π-RAG: Oblivious Retrieval via Semantic Quantization and Transcendental Addressing

π-RAG decouples LLMs from sensitive data by using π's digits as an immutable, uneditable source of entropy. It introduces a semantic quantization layer that maps user inputs to canonical intent centroids, then uses cryptographic salt to generate deterministic offsets pointing to standardized payloads, ensuring oblivious retrieval and mathematical guarantees of data privacy.

arxiv arXiv cs.CL · 2d ago

Topic-to-Timestamp Alignment by Constrained Evidence Selection

A new method improves topic-to-timestamp alignment in meeting transcripts by selecting timestamped evidence instead of generating timecodes. On 420 queries from municipal meeting transcripts, it boosts Recall@5 to 50.0%, reduces MAE to 761.0 seconds, and increases parseable outputs from 373 to 419, showing that retrieval quality and output design are critical.

arxiv arXiv cs.CL · 2d ago

PeerCheck: Improving LLM-Generated Academic Reviews

PeerCheck analyzes differences between LLM and human academic reviews, finding LLMs focus on theory while humans prioritize methodology and experiments. The framework uses prompt engineering like Chain-of-Thought and retrieval-augmented generation, with CoT significantly improving review quality, though RAG introduces an unexpected 'paradox' that sometimes reduces quality.

arxiv arXiv cs.CL · 2d ago

The Token Tax of Epistemic Accuracy in Document-Grounded AI

A study compares retrieval-augmented generation (RAG) and long-context prompting in document-grounded AI. Long-context prompting achieves higher epistemic accuracy—73.1% vs. 65.4%—but at 26 times the per-query token cost, highlighting a significant token tax for broader evidentiary access.

arxiv arXiv cs.CL · 2d ago

Ablation Study of Agentic RAG Components with Local 7B Model

A controlled ablation study evaluates agentic RAG components using a local 7B model on HotpotQA. Fixed hybrid retrieval outperforms adaptive routing by 1.8 EM and 1.9 F1, while two retrieval iterations capture 95% of the gains from five. Query decomposition and cross-encoder reranking show statistically significant but smaller improvements.

media r/LocalLLaMA · 4d ago

semantic-memory: local-first knowledge base with typed graph edges

semantic-memory is a local-first knowledge base in Rust that combines BM25, vector, and reciprocal rank fusion search with SQLite. It features typed graph edges for causal, temporal, and semantic relationships, provenance tracking, bitemporal storage, and adaptive query routing, supporting 18 MCP tools for AI agents. All components run locally without cloud dependencies, API keys, or telemetry.

media r/LocalLLaMA · 5d ago

Help with a Local Document RAG System (Storage + Ingestion + Query + Highlighting)

A user is designing a local, offline document retrieval and LLM pipeline with storage, ingestion, query, and highlighting features. They seek advice on vector databases (e.g., pgvector in Postgres vs Qdrant), GraphRAG feasibility offline, and open-source tools for document highlighting with citations.

media r/LocalLLaMA · 6d ago

How to Setup Search with AI Models

A user asks how to integrate Gemma 4 12B with search capabilities using self-hosted AI models. They mention trying openwebui, which has issues with search engines like DDG, and seek alternatives that avoid using Brave or Google API keys.

arxiv arXiv cs.AI · 6d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.

arxiv arXiv cs.LG · 6d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

arxiv arXiv cs.LG · 6d ago

Train, Retrieve, or Both? Head-to-Head on Statutory Citation for Ontario RTA

A four-arm comparison shows that retrieval is essential for accurate statutory citation under the Ontario Residential Tenancies Act. The SFT+RAG hybrid model achieves 0.481 exact-match with zero hallucinations, outperforming base and SFT-only models, and matches a pipeline using larger, specialized models without needing more data or larger training sets. Results are based on a small, human-verified real-world evaluation set and are preliminary.

arxiv arXiv cs.CL · 6d ago

Multi-Agent Transactive Memory Framework

Multi-Agent Transactive Memory (MATM) enables population-level storage and retrieval of agent-generated trajectories. It allows producer agents to share procedural knowledge with consumer agents, improving task performance and reducing interaction steps in interactive environments like ALFWorld and WebArena without coordination or joint training.

arxiv arXiv cs.CL · 6d ago

Tool-Intent Stabilization in Streaming RAG

A study measures tool-intent stabilization in Streaming RAG, defining when speculative tool queries converge to correct answers. On the CRAG benchmark, 73.9% of queries allow substantial latency hiding, with early stabilization observed in questions with verbatim retrievable evidence. Question type significantly predicts early versus late stabilization, informing when speculative triggers are effective.

arxiv arXiv cs.CL · 6d ago

CATCH-ME if you RAG: Multilingual Counterspeech Dataset for Hate and Misinformation

CATCH-ME introduces the first large-scale, multilingual dataset of contextually annotated, multi-turn counterspeech dialogues targeting hate and misinformation. The dataset covers five languages and focuses on seven marginalized groups, with dialogues grounded in verified fact-checking sources and including document- and chunk-level span annotations for RAG systems.

media r/LocalLLaMA · 7d ago

LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M Released

LFM2.5-Embedding-350M is a dense bi-encoder that provides fast multilingual retrieval with one vector per document, achieving best-in-class accuracy for its size and inference speed comparable to smaller models. LFM2.5-ColBERT-350M is a late interaction retriever with best-in-class multilingual accuracy, enabling cross-lingual retrieval by storing one vector per token and supporting retrieval in multiple languages with high precision. Both models are designed as drop-in replacements for existing RAG pipelines.