All articles — korshunov.ai

All articles Page 1 / 126

Qwen3.6-27B with 3-Critic Harness Matches Frontier Quality

A user tested Qwen3.6-27B (8-bit) alongside GLM5.2 using a coding harness that employs three critics—code review, test review, and Playwright e2e—to validate output quality.

arxiv arXiv cs.CL · 4h ago

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

This paper introduces DriftGuard, a framework that combines multi-monitor drift detection with selective model updating to address evolving toxicity in automated moderation systems. The system tracks specific safety-relevant shifts, such as identity-harm and toxic-risk drift, rather than relying solely on global distributional changes.

arxiv arXiv cs.CL · 4h ago

5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control

The authors introduce 5ting, a system designed for the SemEval-2026 Task 8 (MTRAGEval) which evaluates multi-turn Retrieval Augmented Generation (RAG) systems. The system addresses challenges such as context drift, under specification, and hallucination risk by combining dense retrieval with LLM-based reranking and faithfulness control.

arxiv arXiv cs.CL · 4h ago

Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

The study demonstrates that collapsing annotator disagreement into majority vote labels during hate speech annotation is not neutral, as 42.6% of all disagreement concentrates specifically at the hate/offensive boundary. This pattern indicates that annotators apply different thresholds for where hate begins, creating a structural issue in how ground truth is defined.

arxiv arXiv cs.CL · 4h ago

Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

This paper presents a framework for translating Marathi government documents to English that maintains layout fidelity and structural integrity, addressing limitations of existing systems that neglect formatting. The system integrates layout-aware OCR, coordinate-based text extraction, LLM translation, and HTML reconstruction to ensure spatial alignment and hierarchical consistency.

arxiv arXiv cs.CL · 4h ago

Categorizing Mathematical Concepts with LLM Voting Ensembles in Mathswitch

The open-source project Mathswitch imports mathematical concept records from sources like Wikidata and Wikipedia, linking records that refer to the same concept without reorganizing the original content. To address noise in the imported data, such as non-mathematical or ambiguous items, the authors test whether a voting ensemble of LLM judges can effectively filter this noise.

arxiv arXiv cs.CL · 4h ago

Labeling Training Data for Entity Matching Using Large Language Models

This paper investigates using large language models as teacher models in knowledge-distillation workflows to automatically label training data for smaller student models in entity matching tasks. The study evaluates various pair-selection strategies, teacher and student models, and post-processing methods across five standard benchmarks.

media Hugging Face Forums · 4h ago

AgentSeal: A Corpus-Availability Audit of SWE-bench Pro

The AgentSeal v5 audit tool evaluated the public availability of artifacts in the SWE-bench Pro benchmark to assess potential contamination risks. The study found that while 12 instances showed deterministic content overlap and 76 repositories were probable corpus members, most evidence consisted of date-unknown public replication rather than proven pre-cutoff contamination.

lab Google — The Keyword (AI) · 4h ago

Unlocking Britain’s next era of productivity: Building a nation of AI trailblazers

Google UK has released its latest Economic Impact Report detailing strategies to help more people unlock the benefits of AI-powered technologies in the country.

arxiv arXiv cs.CL · 5h ago

LAMP: Lean-based Agentic framework with MCP and Proof Repair

Researchers introduce LAMP, a multi-agent framework that synthesizes kernel-verified Lean 4 proofs for Combinatorics on Words by providing structured domain knowledge via an ontology. This approach addresses the lack of specialized lemmas in existing provers trained primarily on Mathlib data.

arxiv arXiv cs.CL · 5h ago

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

A comprehensive empirical study reveals that fine-tuning large language models with benign multilingual data significantly increases their tendency to comply with unsafe adversarial prompts, a phenomenon termed multilingual safety drift. The research demonstrates that safety outcomes are highly sensitive to both the language used for fine-tuning and the language of evaluation, with compliance rates increasing four-fold in certain settings.

arxiv arXiv cs.CL · 5h ago

wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

The article introduces wav2VOT, a tool for the automatic estimation of voice onset time, closure duration, and burst realisation that leverages the wav2vec2 model. It addresses the need for accurate speech annotation tools in phonetic research by demonstrating how large speech models can be applied to these specific tasks.

arxiv arXiv cs.CL · 5h ago

License Compatibility Analysis of Corpora for Low-Resource African Languages

This paper audits the license provenance of over twenty corpus families used in African NLP, revealing that while Creative Commons licenses dominate releases, their compatibility rules are rarely applied. The authors construct a six-tier compatibility matrix and apply it to three case-study languages: Kituba/Munukutuba, Zarma, and Moore.

arxiv arXiv cs.CL · 5h ago

Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

This study investigates memory-managed long-context attention by separating a fast recurrent or sparse backbone from explicit editable request-local memory slots and query-time sparse fallback. The research aims to address the limitations of existing linear, recurrent, and sparse attention methods in managing when facts should be written, overwritten, protected, or discarded.

arxiv arXiv cs.CL · 5h ago

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

This paper introduces PASTA, a framework designed to integrate detailed factual information from news articles into Large Language Models (LLMs) to address the challenge of knowledge updating. The approach combines data augmentation, question-answering generation, and a novel self-learning Direct Preference Optimization (DPO) process to enable knowledge overwriting and hallucination suppression.

arxiv arXiv cs.CL · 5h ago

MedEvoEval: Evaluating Continual Evolution of Doctor Agents through Simulated Clinical Episodes

The authors introduce MedEvoEval, an executable longitudinal evaluation framework designed to assess the continual evolution of doctor agents through simulated outpatient clinical episodes. This system moves beyond static benchmarks by tracking how agents acquire evidence, utilize resources, and refine their decision-making across multiple interactions.

arxiv arXiv cs.CL · 5h ago

Latent Bridges for Multi-Table Question Answering

The authors introduce GRAB, a constructor-encoder-bridge pipeline designed for table question answering that lifts relational data into a heterogeneous graph and encodes it via message passing. The method transfers signals to a frozen large language model through a small set of query-conditioned latent tokens, providing a compact structural representation while preserving the LLM's general reasoning capabilities.

arxiv arXiv cs.CL · 6h ago

FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization

Researchers introduce FinInvest-GTCN, a Graph-Temporal-Causal Network designed to optimize venture capital investment decisions by addressing challenges like heterogeneous data and non-stationary time series. The model redefines the task from content recommendation to quantitative risk-return assessment, utilizing a relational graph encoder, multi-scale temporal fusion, and a causal decision head to generate interpretable predictions.

arxiv arXiv cs.CL · 6h ago

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

The authors introduce the Electro-Visual-Language Assistant (EVLA), a framework that integrates multi-modal scene understanding with real-time perception of an electrified powertrain's electro-mechanical state to improve driving decisions. This approach addresses the limitation of existing vision-language models that treat vehicle dynamics as a black box by incorporating physical constraints and optimization objectives.

arxiv arXiv cs.CL · 6h ago

A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions

The A3M framework addresses the challenges of learning to bid in repeated multi-unit auctions by integrating adaptive deep reinforcement learning, adversarial reasoning, and multi-objective reward design. It utilizes an actor-critic backbone and opponent modeling to optimize strategy against non-stationary adversaries while balancing utility, revenue, and fairness.