All articles — korshunov.ai

All articles Page 1 / 128

llama.cpp b9844 release adds NVFP4 support and new binaries

The llama.cpp project has released version b9844, which introduces ggml-webgpu support for the NVFP4 quantization format. This update also provides pre-built binaries for macOS, iOS, Linux, Android, Windows, and openEuler across various hardware backends.

arxiv arXiv cs.CL · 7h ago

Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates

This study evaluates the ability of large-language models to approximate human cultural tastes by generating silicon surrogates from the Survey of Public Participation in the Arts. Using models from OpenAI, Anthropic, and DeepSeek, the authors analyze 277,470 synthetic respondents to determine if LLMs can faithfully replicate real-world survey data.

arxiv arXiv cs.CL · 7h ago

Efficient Retrieval-Augmented Generation via Token Co-occurrence Graphs

Researchers propose TIGRAG (Token-Induced GraphRAG), a framework that uses token co-occurrence statistics to build scalable knowledge graphs for efficient retrieval-augmented generation. This approach addresses the limitations of standard RAG in multi-hop reasoning by avoiding expensive LLM-based extraction pipelines.

arxiv arXiv cs.CL · 7h ago

Information Dynamics of Language Communication

Researchers introduce an information-theoretic framework to quantify the directed flow of semantic content between interlocutors and decompose multi-source contributions into redundant, unique, and synergistic components.

arxiv arXiv cs.CL · 7h ago

Does Verbose Chain-of-Thought Really Help? In-Distribution Evidence that Content, Not Length, Matters

This study investigates whether verbose chain-of-thought prompting improves large language model reasoning through increased computation or by providing useful semantic content. The authors present evidence from in-distribution sampling and controlled interventions to determine the specific factors driving performance gains.

arxiv arXiv cs.CL · 7h ago

DNA Language Models: An Assessment of Pre-Training for Fine-Tuning Tasks

This study evaluates the performance gains of transformer-based DNA language models like DNABERT2 compared to conventional approaches such as ConvNova, specifically addressing the high cost of pre-training. It investigates whether these improvements justify the computational overhead and analyzes the impact of Byte Pair Encoding (BPE) tokenization on genomic tasks.

arxiv arXiv cs.CL · 7h ago

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

This study addresses the conflation of grammatical gender and social semantic bias in contextual language models for gendered languages like Spanish, proposing a framework to disentangle these dimensions. The authors construct balanced datasets using controlled templates and natural Wikipedia contexts to estimate gender directions while suppressing contamination.

arxiv arXiv cs.CL · 7h ago

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

The authors introduce Cortex, a framework that transforms web-scale corpus construction from flat document filtering into structured knowledge organization using an Ontological Corpus Graph (OCG). This three-layer structure unifies quality-refined content, hierarchical lightweight ontology, and cross-domain alignment to address the escalating data requirements of large language models.

arxiv arXiv cs.CL · 7h ago

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

Researchers introduce the Dynamic Agent-based Interaction Network (DAIN), a framework that reconceptualizes multimodal fusion as a dynamic, multi-agent collaborative process rather than relying on static architectures. DAIN utilizes a context-aware Meta-Controller to dynamically schedule sparse activation of specialized agents and orchestrates compressed communication for consensus-building.

arxiv arXiv cs.CL · 7h ago

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

This paper analyzes non-sequential multimodal sentence-level embeddings, focusing on the SONAR model, to demonstrate that specific embedding dimensions are sensitive to perturbations and can indicate decoding anomalies. By leveraging consistency between successive encoding and decoding, the authors successfully build an accurate anomaly detector.

arxiv arXiv cs.CL · 8h ago

Before Thinking, Learn to Decide: Proactive Routing for Efficient Visual Reasoning

The authors propose PRP, a Proactive Routing Paradigm that accelerates inference in large multimodal models by enabling early decision-making through joint evaluation of draft and target model competence. This approach addresses the bottleneck of establishing reliable query difficulty signals in multimodal settings without relying on data-sensitive supervised fine-tuning or post-hoc token probabilities.

arxiv arXiv cs.CL · 8h ago

EvalSafetyGap: A Hybrid Survey and Conceptual Framework for LLM Evaluation-Safety Failures

This paper addresses the shared measurement problem in LLM evaluation and AI safety, where benchmark scores often improve while latent safety properties remain difficult to verify. It introduces EvalSafetyGap, a hybrid survey and conceptual framework combining systematic evidence synthesis with a structured audit of ten models.

arxiv arXiv cs.CL · 8h ago

CaresAI at CT-DEB26: Detecting Dosing Errors In Clinical Trials Using Domain-Specific Transformer Embeddings and Classification Models

This study evaluates the use of domain-specific transformer embeddings combined with classical machine learning models to detect dosing errors in clinical trial protocols. The research aims to improve patient safety and trial integrity by identifying preventable medication errors early through text representation analysis.

arxiv arXiv cs.CL · 8h ago

Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study

This study compared the recognition performance of human listeners against three state-of-the-art off-the-shelf ASR systems (Whisper-large-V3, Google Chirp 3, and Omnilingual) on Dutch continuous read and spontaneous speech from a single speaker with severe dysarthria.

arxiv arXiv cs.CL · 8h ago

Grounding LLM Reasoning under Incomplete Graph Evidence

This article presents a theoretical framework for grounding large language model reasoning trajectories when relying on incomplete knowledge graph evidence rather than complete truth states.

arxiv arXiv cs.CL · 8h ago

Multi-Agentic System Leveraging Open-Source LLMs to Mitigate Disinformation Threats

This article proposes a novel multi-agent system that emulates human annotator decision-making processes to detect and debunk disinformation, achieving superior results compared to individual Large Language Models like GPT-4 and GPT-3.5.

arxiv arXiv cs.CL · 8h ago

When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

This article develops a theory for speculative decoding regimes that use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, rather than the stochastic distribution-preserving settings studied in existing literature. The authors characterize rejection regions as lower level sets of the target distribution to derive exact KL divergence requirements and sharp margin-based bounds for various acceptance criteria.

arxiv arXiv cs.CL · 8h ago

DialogPII: A multilingual dataset of synthetic dialog transcripts to detect personal information

Researchers present DialogPII, a multilingual dataset of synthetic dialog transcripts designed to support the development and evaluation of automatic systems for detecting personally identifiable information. This resource addresses privacy concerns in sensitive domains by providing annotated data across 11 languages and eight interaction scenarios.

arxiv arXiv cs.CL · 8h ago

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

The authors propose a novel training approach for end-to-end automatic speech recognition (ASR) that addresses noisy labels and lack of domain specificity in large-scale weakly supervised datasets. The method involves pretraining on the full dataset, continued pretraining on a filtered subset based on character error rate, and fine-tuning on acoustically similar samples from that subset.

media r/LocalLLaMA · 9h ago

Qwen3.6-27B with 3-Critic Harness Matches Frontier Quality

A user tested Qwen3.6-27B (8-bit) alongside GLM5.2 using a coding harness that employs three critics—code review, test review, and Playwright e2e—to validate output quality.