Retrieval & RAG — korshunov.ai

Retrieval & RAG Page 1 / 2

Framework Evaluates When GraphRAG and Agentic RAG Are Needed

The authors introduce a framework for evaluating and comparing regular, GraphRAG, Modular, and Agentic Retrieval-Augmented Generation (RAG) on semi-structured knowledge bases. They implement nine standardized scenarios spanning simple document retrieval to complex hybrid text-graph integration and agentic multi-step planning. A novel context engineering method is presented to address memory overflow issues in advanced RAG variants through new representations and agentic loop design. This optimization achieves a 19% to 53% reduction in token usage while efficiently managing retrievals. Further analysis reveals a retrieval-generation gap where expanded retrieval does not proportionally improve generation quality. The study suggests that current retrieval-oriented metrics may overstate the benefits of advanced retrieval techniques. These data-driven insights aim to guide the development of production-ready intelligent RAG systems.

arxiv arXiv cs.CL · just now Live

TRACE: Lightweight Detection of Corpus Poisoning in RAG via Token Influence Attribution

Retrieval-Augmented Generation systems face significant risks from corpus poisoning attacks that manipulate outputs through malicious documents. Existing detection methods often require auxiliary classifiers or additional LLM verification, which introduces substantial computational overhead. To address this, researchers introduced TRACE, a lightweight framework that identifies poisoning by tracing answer-related tokens via influence attribution. The system first discovers recurrent high-influence keywords across retrieved documents to flag potential threats. It then performs secondary verification to confirm the specific influence of these tokens on model predictions. Experiments conducted on three QA benchmarks and six LLMs demonstrate strong detection performance for the framework. Additionally, TRACE successfully uncovers attacker-specified target answers during the verification process.

arxiv arXiv cs.CL · 2h ago

How Large Language Models Source Brand Reputation Across Languages and Markets

This study analyzes the citation sources used by large language models when answering questions about brands, focusing on the underlying web references rather than just the generated text. The researchers merged three Rankfor.AI datasets to examine 167,551 URL-grounded citations across 128 brands in 12 home markets and 13 languages. The analysis reveals that AI grounds brand answers overwhelmingly in third-party sources, with 85.7% of citations pointing to sites the brand does not own compared to only 14.3% for owned domains. The source base is highly concentrated and follows a Zipf law, where 80% of citations originate from approximately 18% of domains. Wikipedia emerges as the dominant reference site, being the most-cited domain in 11 of the 12 languages studied. The only exception is Lithuanian, where the business daily vz.lt slightly edges out Wikipedia with a 4.38% share. Additionally, the source mix shows market-specific variations, such as YouTube being the top cited domain for Polish national brands and HR portals supplying more citations than Polish Wikipedia.

media Hugging Face Forums · 5h ago

Ontological Inversion: Flipping LLM Emotional Concepts via Negative Gain

The author introduces 'ontological inversion,' a technique designed to expand the one-directional inference nature of Large Language Models. This method allows models to capture nuanced, multifaceted concepts, such as memories that evoke both sorrow and joy simultaneously. The approach was developed by applying a negative gain factor during sweeps into the Niodoo steering architecture. It addresses the common limitation where LLMs overfit to singular emotional labels when prompted with personal experiences. By inverting concepts similarly to physics involution, the technique enables models to flip emotional states, such as transforming sorrowful memories into joyful ones. The work is shared via a GitHub repository titled 'ontological-inversion' by user Ruffian-L.

arxiv arXiv cs.AI · 12h ago

Hi-Seg: Human-AI Collaboration for Pulmonary Nodule Segmentation

Hi-Seg, a human-in-the-loop framework built on SAM, achieves a mean Dice score of almost 85% in pulmonary nodule segmentation. It outperforms five state-of-the-art deep learning models and 13 SAM variants, with non-medical annotators matching junior medical student performance, reducing clinician workload and enabling scalable annotation.

arxiv arXiv cs.AI · 13h ago

Prompt-Side Preprocessing Enhances Edge AI Accuracy

A structured prompt framework improves local LLM accuracy in environmental monitoring by transforming raw sensor data into enriched textual representations. Evaluations on indoor and outdoor datasets show local model accuracy increases from 50.9% to 81.7% indoors and 63.7% to 79.3% outdoors with enriched prompts, while maintaining low latency of nearly 0.22 seconds in no-chain-of-thought mode.

arxiv arXiv cs.AI · 14h ago

Deep Learning Pipeline for Sign Language Recognition and Translation to Indian Vernaculars

A two-stage deep learning pipeline classifies Indian sign language video clips into English words using a fine-tuned VideoMAE model and translates them into Hindi, Telugu, and Bengali via the NLLB-200 multilingual model. The system achieves 99% training and 78% validation accuracy on a 13-class, 197-clips dataset with uniform 16-frame clips at 22-224 resolution, and includes a Streamlit demo for user-uploaded videos with per-class analysis and failure mode identification.

media r/LocalLLaMA · 20h ago

Unlimited-OCR is now on ModelScope

Unlimited-OCR, a 3.3B multilingual OCR model, is available on ModelScope. It supports one-shot parsing for single images, multi-page documents, and PDFs, with full-document parsing and up to 32K output length. The model includes base and gundam image modes for diverse document layouts and supports Transformers inference with OpenAI-compatible streaming.

arxiv arXiv cs.CL · 23h ago

MMed-Bench-IR: A Multilingual Medical Retrieval Benchmark

MMed-Bench-IR introduces a heterogeneous benchmark for multilingual medical information retrieval across six languages. It evaluates cross-lingual alignment, concept discrimination, and evidence retrieval through three distinct tasks with no overlapping concepts or queries. Evaluation shows significant cross-lingual performance drops, with English biomedical encoders falling from 0.818 to 0.056 nDCG@10 when transitioning to Japanese, highlighting limitations undetected by English-only benchmarks.

arxiv arXiv cs.CL · 1d ago

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting

A multi-agent framework sanitizes retrieved content by removing sensitive identifiers through semantic rewriting, reducing privacy leakage in targeted attacks. It maintains strong contextual fidelity with a BLEU-1 score of 0.122, outperforming SAGE's 0.117, and operates as an asynchronous preprocessing step with no added latency to online inference.

media Hugging Face Forums · 1d ago

Spaces tokens stop working after update

Users report that Spaces tokens no longer function after a recent update. No generated files are being saved, disrupting workflow and model execution.

media r/LocalLLaMA · 1d ago

LLM Medical Scribing Benchmark: Omissions Outnumber Hallucinations

A benchmark of 8 LLMs on 300 synthetic doctor-patient dialogues found 12 high-impact hallucinations and 520 clinically relevant omissions. Omissions were far more common than hallucinations, with DeepSeek excelling in prose and cost but missing many safety facts, while Claude Opus had fewest omissions but poorer prose quality.

media r/LocalLLaMA · 1d ago

Comparing Docling, Liteparse, MinerU, and Unstructured for On-Prem Document Processing

A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.

media r/LocalLLaMA · 2d ago

Why is Gemma 4 26b not mentioned more?

Users note a lack of discussion around Gemma 4 26b despite its potential suitability for personal assistant and RAG tasks on a solo 3090. The model is considered a strong candidate for all-in-one local AI applications, though it receives less attention compared to Qwen3.6 or Gemma4 31b.

lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.

arxiv arXiv cs.CL · 2d ago

ViRGo: Adaptive Routing for Visual Retrieval and Global Perception

ViRGo introduces a lightweight framework that adapts visual retrieval based on object scale. It uses intrinsic localization and semantic confidence to route between global perception, patch-based retrieval, and attention-based retrieval, improving accuracy-efficiency trade-offs without extra computation.

arxiv arXiv cs.CL · 2d ago

π-RAG: Oblivious Retrieval via Semantic Quantization and Transcendental Addressing

π-RAG decouples LLMs from sensitive data by using π's digits as an immutable, uneditable source of entropy. It introduces a semantic quantization layer that maps user inputs to canonical intent centroids, then uses cryptographic salt to generate deterministic offsets pointing to standardized payloads, ensuring oblivious retrieval and mathematical guarantees of data privacy.

arxiv arXiv cs.CL · 2d ago

Topic-to-Timestamp Alignment by Constrained Evidence Selection

A new method improves topic-to-timestamp alignment in meeting transcripts by selecting timestamped evidence instead of generating timecodes. On 420 queries from municipal meeting transcripts, it boosts Recall@5 to 50.0%, reduces MAE to 761.0 seconds, and increases parseable outputs from 373 to 419, showing that retrieval quality and output design are critical.

arxiv arXiv cs.CL · 2d ago

PeerCheck: Improving LLM-Generated Academic Reviews

PeerCheck analyzes differences between LLM and human academic reviews, finding LLMs focus on theory while humans prioritize methodology and experiments. The framework uses prompt engineering like Chain-of-Thought and retrieval-augmented generation, with CoT significantly improving review quality, though RAG introduces an unexpected 'paradox' that sometimes reduces quality.

arxiv arXiv cs.CL · 2d ago

The Token Tax of Epistemic Accuracy in Document-Grounded AI

A study compares retrieval-augmented generation (RAG) and long-context prompting in document-grounded AI. Long-context prompting achieves higher epistemic accuracy—73.1% vs. 65.4%—but at 26 times the per-query token cost, highlighting a significant token tax for broader evidentiary access.