All articles — korshunov.ai

All articles Page 1 / 129

Does Verbose Chain-of-Thought Really Help? In-Distribution Evidence that Content, Not Length, Matters

This study investigates whether verbose chain-of-thought prompting improves large language model reasoning through increased computation or by providing useful semantic content. The authors present evidence from in-distribution sampling and controlled interventions to determine the specific factors driving performance gains.

arxiv arXiv cs.CL · 9h ago

DNA Language Models: An Assessment of Pre-Training for Fine-Tuning Tasks

This study evaluates the performance gains of transformer-based DNA language models like DNABERT2 compared to conventional approaches such as ConvNova, specifically addressing the high cost of pre-training. It investigates whether these improvements justify the computational overhead and analyzes the impact of Byte Pair Encoding (BPE) tokenization on genomic tasks.

arxiv arXiv cs.CL · 9h ago

Estimating Grammatical Gender Directions in Contextual Embeddings under Controlled and Natural Contexts

This study addresses the conflation of grammatical gender and social semantic bias in contextual language models for gendered languages like Spanish, proposing a framework to disentangle these dimensions. The authors construct balanced datasets using controlled templates and natural Wikipedia contexts to estimate gender directions while suppressing contamination.

arxiv arXiv cs.CL · 9h ago

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

The authors introduce Cortex, a framework that transforms web-scale corpus construction from flat document filtering into structured knowledge organization using an Ontological Corpus Graph (OCG). This three-layer structure unifies quality-refined content, hierarchical lightweight ontology, and cross-domain alignment to address the escalating data requirements of large language models.

arxiv arXiv cs.CL · 9h ago

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

Researchers introduce the Dynamic Agent-based Interaction Network (DAIN), a framework that reconceptualizes multimodal fusion as a dynamic, multi-agent collaborative process rather than relying on static architectures. DAIN utilizes a context-aware Meta-Controller to dynamically schedule sparse activation of specialized agents and orchestrates compressed communication for consensus-building.

arxiv arXiv cs.CL · 9h ago

Forewarned is Forearmed: When Non-Sequential Embedding Turns Into an Anomaly Detector

This paper analyzes non-sequential multimodal sentence-level embeddings, focusing on the SONAR model, to demonstrate that specific embedding dimensions are sensitive to perturbations and can indicate decoding anomalies. By leveraging consistency between successive encoding and decoding, the authors successfully build an accurate anomaly detector.

arxiv arXiv cs.CL · 10h ago

Before Thinking, Learn to Decide: Proactive Routing for Efficient Visual Reasoning

The authors propose PRP, a Proactive Routing Paradigm that accelerates inference in large multimodal models by enabling early decision-making through joint evaluation of draft and target model competence. This approach addresses the bottleneck of establishing reliable query difficulty signals in multimodal settings without relying on data-sensitive supervised fine-tuning or post-hoc token probabilities.

arxiv arXiv cs.CL · 10h ago

EvalSafetyGap: A Hybrid Survey and Conceptual Framework for LLM Evaluation-Safety Failures

This paper addresses the shared measurement problem in LLM evaluation and AI safety, where benchmark scores often improve while latent safety properties remain difficult to verify. It introduces EvalSafetyGap, a hybrid survey and conceptual framework combining systematic evidence synthesis with a structured audit of ten models.

arxiv arXiv cs.CL · 10h ago

CaresAI at CT-DEB26: Detecting Dosing Errors In Clinical Trials Using Domain-Specific Transformer Embeddings and Classification Models

This study evaluates the use of domain-specific transformer embeddings combined with classical machine learning models to detect dosing errors in clinical trial protocols. The research aims to improve patient safety and trial integrity by identifying preventable medication errors early through text representation analysis.

arxiv arXiv cs.CL · 10h ago

Comparing Human and Automatic Recognition of Dutch Dysarthric Continuous Speech: A Case Study

This study compared the recognition performance of human listeners against three state-of-the-art off-the-shelf ASR systems (Whisper-large-V3, Google Chirp 3, and Omnilingual) on Dutch continuous read and spontaneous speech from a single speaker with severe dysarthria.

arxiv arXiv cs.CL · 10h ago

Grounding LLM Reasoning under Incomplete Graph Evidence

This article presents a theoretical framework for grounding large language model reasoning trajectories when relying on incomplete knowledge graph evidence rather than complete truth states.

arxiv arXiv cs.CL · 10h ago

Multi-Agentic System Leveraging Open-Source LLMs to Mitigate Disinformation Threats

This article proposes a novel multi-agent system that emulates human annotator decision-making processes to detect and debunk disinformation, achieving superior results compared to individual Large Language Models like GPT-4 and GPT-3.5.

arxiv arXiv cs.CL · 10h ago

When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

This article develops a theory for speculative decoding regimes that use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, rather than the stochastic distribution-preserving settings studied in existing literature. The authors characterize rejection regions as lower level sets of the target distribution to derive exact KL divergence requirements and sharp margin-based bounds for various acceptance criteria.

arxiv arXiv cs.CL · 10h ago

DialogPII: A multilingual dataset of synthetic dialog transcripts to detect personal information

Researchers present DialogPII, a multilingual dataset of synthetic dialog transcripts designed to support the development and evaluation of automatic systems for detecting personally identifiable information. This resource addresses privacy concerns in sensitive domains by providing annotated data across 11 languages and eight interaction scenarios.

arxiv arXiv cs.CL · 10h ago

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

The authors propose a novel training approach for end-to-end automatic speech recognition (ASR) that addresses noisy labels and lack of domain specificity in large-scale weakly supervised datasets. The method involves pretraining on the full dataset, continued pretraining on a filtered subset based on character error rate, and fine-tuning on acoustically similar samples from that subset.

media r/LocalLLaMA · 11h ago

Qwen3.6-27B with 3-Critic Harness Matches Frontier Quality

A user tested Qwen3.6-27B (8-bit) alongside GLM5.2 using a coding harness that employs three critics—code review, test review, and Playwright e2e—to validate output quality.

arxiv arXiv cs.CL · 11h ago

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

This paper introduces DriftGuard, a framework that combines multi-monitor drift detection with selective model updating to address evolving toxicity in automated moderation systems. The system tracks specific safety-relevant shifts, such as identity-harm and toxic-risk drift, rather than relying solely on global distributional changes.

arxiv arXiv cs.CL · 11h ago

5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control

The authors introduce 5ting, a system designed for the SemEval-2026 Task 8 (MTRAGEval) which evaluates multi-turn Retrieval Augmented Generation (RAG) systems. The system addresses challenges such as context drift, under specification, and hallucination risk by combining dense retrieval with LLM-based reranking and faithfulness control.

arxiv arXiv cs.CL · 11h ago

Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

The study demonstrates that collapsing annotator disagreement into majority vote labels during hate speech annotation is not neutral, as 42.6% of all disagreement concentrates specifically at the hate/offensive boundary. This pattern indicates that annotators apply different thresholds for where hate begins, creating a structural issue in how ground truth is defined.

arxiv arXiv cs.CL · 11h ago

Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

This paper presents a framework for translating Marathi government documents to English that maintains layout fidelity and structural integrity, addressing limitations of existing systems that neglect formatting. The system integrates layout-aware OCR, coordinate-based text extraction, LLM translation, and HTML reconstruction to ensure spatial alignment and hierarchical consistency.