All articles — korshunov.ai

All articles Page 1 / 129

IHDec: Divergence-Steered Contrastive Decoding for Securing Multi-Turn Instruction Hierarchies

IHDec addresses the failure of Large Language Models to maintain instruction hierarchies in multi-turn contexts by leveraging Jensen-Shannon Divergence to detect and correct role-influence inversions. This training-free method dynamically suppresses subordinate roles that override superior directives during token generation.

arxiv arXiv cs.CL · 8h ago

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

This study introduces approach-level diversity to address the gap between surface-level variation and actual strategic differences in LLM mathematical reasoning. It demonstrates that prior metrics fail to capture true methodological diversity, leading to a decline in approach-level diversity during diversity-aware RLVR training.

arxiv arXiv cs.CL · 8h ago

VISTA: A Proprioceptive Dashboard for LLM Context Management

The article introduces VISTA, a training-free layer designed to address the context window limitations of long-horizon tool agents by exposing their internal state. It argues that frontier models are blind to their own context usage and proposes an interface that surfaces working memory details rather than relying on learned compression policies.

arxiv arXiv cs.CL · 8h ago

Node-to-Neighborhood Semantic Consistency: Text-Topology Alignment for TAGs Anomaly Detection

This paper addresses graph anomaly detection on text-attributed graphs by formalizing it as a node-to-neighborhood semantic consistency problem, where anomalies stem from mismatches between textual semantics and topological relationships. The authors propose N2NSC, a framework that uses two complementary fusion paths to align graph topology with textual semantics, enabling large language models to leverage both structural and textual neighborhood information.

arxiv arXiv cs.CL · 8h ago

SHOVIR: A Benchmark for Evaluating Vision Shortcut Learning in Radiology Report Generation

The SHOVIR benchmark evaluates vision shortcut learning in radiology report generation by extending MIMIC-CXR and PadChest-GR with per-box CheXpert labels. It utilizes image-level and disease-level occlusion experiments to isolate direct and contextual shortcuts where models rely on spurious correlations rather than actual visual evidence.

github llama.cpp · 8h ago

llama.cpp b9844 release adds NVFP4 support and new binaries

The llama.cpp project has released version b9844, which introduces ggml-webgpu support for the NVFP4 quantization format. This update also provides pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler across various hardware backends.

arxiv arXiv cs.CL · 9h ago

Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates

This study evaluates the ability of large-language models to approximate human cultural tastes by generating silicon surrogates from the Survey of Public Participation in the Arts. Using models from OpenAI, Anthropic, and DeepSeek, the authors analyze 277,470 synthetic respondents to determine if LLMs can faithfully replicate real-world survey data.

arxiv arXiv cs.CL · 9h ago

Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs

Researchers propose TIGRAG (Token-Induced GraphRAG), a framework that uses token co-occurrence statistics to build scalable knowledge graphs for efficient retrieval-augmented generation. This approach addresses the limitations of standard RAG in multi-hop reasoning by avoiding expensive LLM-based extraction pipelines.

arxiv arXiv cs.CL · 9h ago

Information Dynamics of Language Communication

Researchers introduce an information-theoretic framework to quantify the directed flow of semantic content between interlocutors and decompose multi-source contributions into redundant, unique, and synergistic components.

arxiv arXiv cs.CL · 9h ago

Does Verbose Chain-of-Thought Really Help? In-Distribution Evidence that Content, Not Length, Matters

This study investigates whether verbose chain-of-thought prompting improves large language model reasoning through increased computation or by providing useful semantic content. The authors present evidence from in-distribution sampling and controlled interventions to determine the specific factors driving performance gains.

arxiv arXiv cs.CL · 9h ago

DNA Language Models: An Assessment of Pre-Training for Fine-Tuning Tasks

This study evaluates the performance gains of transformer-based DNA language models like DNABERT2 compared to conventional approaches such as ConvNova, specifically addressing the high cost of pre-training. It investigates whether these improvements justify the computational overhead and analyzes the impact of Byte Pair Encoding (BPE) tokenization on genomic tasks.

arxiv arXiv cs.CL · 9h ago

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

This study addresses the conflation of grammatical gender and social semantic bias in contextual language models for gendered languages like Spanish, proposing a framework to disentangle these dimensions. The authors construct balanced datasets using controlled templates and natural Wikipedia contexts to estimate gender directions while suppressing contamination.

arxiv arXiv cs.CL · 9h ago

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

The authors introduce Cortex, a framework that transforms web-scale corpus construction from flat document filtering into structured knowledge organization using an Ontological Corpus Graph (OCG). This three-layer structure unifies quality-refined content, hierarchical lightweight ontology, and cross-domain alignment to address the escalating data requirements of large language models.

arxiv arXiv cs.CL · 9h ago

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

Researchers introduce the Dynamic Agent-based Interaction Network (DAIN), a framework that reconceptualizes multimodal fusion as a dynamic, multi-agent collaborative process rather than relying on static architectures. DAIN utilizes a context-aware Meta-Controller to dynamically schedule sparse activation of specialized agents and orchestrates compressed communication for consensus-building.

arxiv arXiv cs.CL · 9h ago

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

This paper analyzes non-sequential multimodal sentence-level embeddings, focusing on the SONAR model, to demonstrate that specific embedding dimensions are sensitive to perturbations and can indicate decoding anomalies. By leveraging consistency between successive encoding and decoding, the authors successfully build an accurate anomaly detector.

arxiv arXiv cs.CL · 10h ago

Before Thinking, Learn to Decide: Proactive Routing for Efficient Visual Reasoning

The authors propose PRP, a Proactive Routing Paradigm that accelerates inference in large multimodal models by enabling early decision-making through joint evaluation of draft and target model competence. This approach addresses the bottleneck of establishing reliable query difficulty signals in multimodal settings without relying on data-sensitive supervised fine-tuning or post-hoc token probabilities.

arxiv arXiv cs.CL · 10h ago

EvalSafetyGap: A Hybrid Survey and Conceptual Framework for LLM Evaluation-Safety Failures

This paper addresses the shared measurement problem in LLM evaluation and AI safety, where benchmark scores often improve while latent safety properties remain difficult to verify. It introduces EvalSafetyGap, a hybrid survey and conceptual framework combining systematic evidence synthesis with a structured audit of ten models.

arxiv arXiv cs.CL · 10h ago

CaresAI at CT-DEB26: Detecting Dosing Errors In Clinical Trials Using Domain-Specific Transformer Embeddings and Classification Models

This study evaluates the use of domain-specific transformer embeddings combined with classical machine learning models to detect dosing errors in clinical trial protocols. The research aims to improve patient safety and trial integrity by identifying preventable medication errors early through text representation analysis.

arxiv arXiv cs.CL · 10h ago

Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study

This study compared the recognition performance of human listeners against three state-of-the-art off-the-shelf ASR systems (Whisper-large-V3, Google Chirp 3, and Omnilingual) on Dutch continuous read and spontaneous speech from a single speaker with severe dysarthria.

arxiv arXiv cs.CL · 10h ago

Grounding LLM Reasoning under Incomplete Graph Evidence

This article presents a theoretical framework for grounding large language model reasoning trajectories when relying on incomplete knowledge graph evidence rather than complete truth states.