Retrieval & RAG — korshunov.ai

Retrieval & RAG Page 2 / 3

Unlimited-OCR is now on ModelScope

Unlimited-OCR, a 3.3B multilingual OCR model, is available on ModelScope. It supports one-shot parsing for single images, multi-page documents, and PDFs, with full-document parsing and up to 32K output length. The model includes base and gundam image modes for diverse document layouts and supports Transformers inference with OpenAI-compatible streaming.

arxiv arXiv cs.CL · 1d ago

MMed-Bench-IR: A Multilingual Medical Retrieval Benchmark

MMed-Bench-IR introduces a heterogeneous benchmark for multilingual medical information retrieval across six languages. It evaluates cross-lingual alignment, concept discrimination, and evidence retrieval through three distinct tasks with no overlapping concepts or queries. Evaluation shows significant cross-lingual performance drops, with English biomedical encoders falling from 0.818 to 0.056 nDCG@10 when transitioning to Japanese, highlighting limitations undetected by English-only benchmarks.

arxiv arXiv cs.CL · 2d ago

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting

A multi-agent framework sanitizes retrieved content by removing sensitive identifiers through semantic rewriting, reducing privacy leakage in targeted attacks. It maintains strong contextual fidelity with a BLEU-1 score of 0.122, outperforming SAGE's 0.117, and operates as an asynchronous preprocessing step with no added latency to online inference.

media Hugging Face Forums · 2d ago

Spaces tokens stop working after update

Users report that Spaces tokens no longer function after a recent update. No generated files are being saved, disrupting workflow and model execution.

media r/LocalLLaMA · 2d ago

LLM Medical Scribing Benchmark: Omissions Outnumber Hallucinations

A benchmark of 8 LLMs on 300 synthetic doctor-patient dialogues found 12 high-impact hallucinations and 520 clinically relevant omissions. Omissions were far more common than hallucinations, with DeepSeek excelling in prose and cost but missing many safety facts, while Claude Opus had fewest omissions but poorer prose quality.

media r/LocalLLaMA · 2d ago

Comparing Docling, Liteparse, MinerU, and Unstructured for On-Prem Document Processing

A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.

media r/LocalLLaMA · 2d ago

Why is Gemma 4 26b not mentioned more?

Users note a lack of discussion around Gemma 4 26b despite its potential suitability for personal assistant and RAG tasks on a solo 3090. The model is considered a strong candidate for all-in-one local AI applications, though it receives less attention compared to Qwen3.6 or Gemma4 31b.

lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.

arxiv arXiv cs.CL · 2d ago

ViRGo: Adaptive Routing for Visual Retrieval and Global Perception

ViRGo introduces a lightweight framework that adapts visual retrieval based on object scale. It uses intrinsic localization and semantic confidence to route between global perception, patch-based retrieval, and attention-based retrieval, improving accuracy-efficiency trade-offs without extra computation.

arxiv arXiv cs.CL · 2d ago

π-RAG: Oblivious Retrieval via Semantic Quantization and Transcendental Addressing

π-RAG decouples LLMs from sensitive data by using π's digits as an immutable, uneditable source of entropy. It introduces a semantic quantization layer that maps user inputs to canonical intent centroids, then uses cryptographic salt to generate deterministic offsets pointing to standardized payloads, ensuring oblivious retrieval and mathematical guarantees of data privacy.

arxiv arXiv cs.CL · 2d ago

Topic-to-Timestamp Alignment by Constrained Evidence Selection

A new method improves topic-to-timestamp alignment in meeting transcripts by selecting timestamped evidence instead of generating timecodes. On 420 queries from municipal meeting transcripts, it boosts Recall@5 to 50.0%, reduces MAE to 761.0 seconds, and increases parseable outputs from 373 to 419, showing that retrieval quality and output design are critical.

arxiv arXiv cs.CL · 2d ago

PeerCheck: Improving LLM-Generated Academic Reviews

PeerCheck analyzes differences between LLM and human academic reviews, finding LLMs focus on theory while humans prioritize methodology and experiments. The framework uses prompt engineering like Chain-of-Thought and retrieval-augmented generation, with CoT significantly improving review quality, though RAG introduces an unexpected 'paradox' that sometimes reduces quality.

arxiv arXiv cs.CL · 2d ago

The Token Tax of Epistemic Accuracy in Document-Grounded AI

A study compares retrieval-augmented generation (RAG) and long-context prompting in document-grounded AI. Long-context prompting achieves higher epistemic accuracy—73.1% vs. 65.4%—but at 26 times the per-query token cost, highlighting a significant token tax for broader evidentiary access.

arxiv arXiv cs.CL · 3d ago

Ablation Study of Agentic RAG Components with Local 7B Model

A controlled ablation study evaluates agentic RAG components using a local 7B model on HotpotQA. Fixed hybrid retrieval outperforms adaptive routing by 1.8 EM and 1.9 F1, while two retrieval iterations capture 95% of the gains from five. Query decomposition and cross-encoder reranking show statistically significant but smaller improvements.

media r/LocalLLaMA · 5d ago

semantic-memory: local-first knowledge base with typed graph edges

semantic-memory is a local-first knowledge base in Rust that combines BM25, vector, and reciprocal rank fusion search with SQLite. It features typed graph edges for causal, temporal, and semantic relationships, provenance tracking, bitemporal storage, and adaptive query routing, supporting 18 MCP tools for AI agents. All components run locally without cloud dependencies, API keys, or telemetry.

media r/LocalLLaMA · 5d ago

Help with a Local Document RAG System (Storage + Ingestion + Query + Highlighting)

A user is designing a local, offline document retrieval and LLM pipeline with storage, ingestion, query, and highlighting features. They seek advice on vector databases (e.g., pgvector in Postgres vs Qdrant), GraphRAG feasibility offline, and open-source tools for document highlighting with citations.

media r/LocalLLaMA · 6d ago

How to Setup Search with AI Models

A user asks how to integrate Gemma 4 12B with search capabilities using self-hosted AI models. They mention trying openwebui, which has issues with search engines like DDG, and seek alternatives that avoid using Brave or Google API keys.

arxiv arXiv cs.AI · 6d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.

arxiv arXiv cs.LG · 6d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

arxiv arXiv cs.LG · 6d ago

Train, Retrieve, or Both? Head-to-Head on Statutory Citation for Ontario RTA

A four-arm comparison shows that retrieval is essential for accurate statutory citation under the Ontario Residential Tenancies Act. The SFT+RAG hybrid model achieves 0.481 exact-match with zero hallucinations, outperforming base and SFT-only models, and matches a pipeline using larger, specialized models without needing more data or larger training sets. Results are based on a small, human-verified real-world evaluation set and are preliminary.