All articles — korshunov.ai

All articles Page 1 / 126

Multi-Agentic System Leveraging Open-Source LLMs to Mitigate Disinformation Threats

This article proposes a novel multi-agent system that emulates human annotator decision-making processes to detect and debunk disinformation, achieving superior results compared to individual Large Language Models like GPT-4 and GPT-3.5.

arxiv arXiv cs.CL · 4h ago

When Is a Draft Accepted? A Theory of Acceptance in Speculative Decoding

This article develops a theory for speculative decoding regimes that use greedy decoding, relaxed acceptance rules, or tree-based candidate sets, rather than the stochastic distribution-preserving settings studied in existing literature. The authors characterize rejection regions as lower level sets of the target distribution to derive exact KL divergence requirements and sharp margin-based bounds for various acceptance criteria.

arxiv arXiv cs.CL · 4h ago

DialogPII: A multilingual dataset of synthetic dialog transcripts to detect personal information

Researchers present DialogPII, a multilingual dataset of synthetic dialog transcripts designed to support the development and evaluation of automatic systems for detecting personally identifiable information. This resource addresses privacy concerns in sensitive domains by providing annotated data across 11 languages and eight interaction scenarios.

arxiv arXiv cs.CL · 4h ago

Improving Large-Scale Weakly Supervised ASR by Filtering and Selection

The authors propose a novel training approach for end-to-end automatic speech recognition (ASR) that addresses noisy labels and lack of domain specificity in large-scale weakly supervised datasets. The method involves pretraining on the full dataset, continued pretraining on a filtered subset based on character error rate, and fine-tuning on acoustically similar samples from that subset.

media r/LocalLLaMA · 5h ago

Qwen3.6-27B with 3-Critic Harness Matches Frontier Quality

A user tested Qwen3.6-27B (8-bit) alongside GLM5.2 using a coding harness that employs three critics—code review, test review, and Playwright e2e—to validate output quality.

arxiv arXiv cs.CL · 5h ago

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

This paper introduces DriftGuard, a framework that combines multi-monitor drift detection with selective model updating to address evolving toxicity in automated moderation systems. The system tracks specific safety-relevant shifts, such as identity-harm and toxic-risk drift, rather than relying solely on global distributional changes.

arxiv arXiv cs.CL · 5h ago

5ting at SemEval-2026 Task 8: Strong End-to-End Multi-Turn RAG via LLM-Based Reranking and Faithfulness Control

The authors introduce 5ting, a system designed for the SemEval-2026 Task 8 (MTRAGEval) which evaluates multi-turn Retrieval Augmented Generation (RAG) systems. The system addresses challenges such as context drift, under specification, and hallucination risk by combining dense retrieval with LLM-based reranking and faithfulness control.

arxiv arXiv cs.CL · 5h ago

Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

The study demonstrates that collapsing annotator disagreement into majority vote labels during hate speech annotation is not neutral, as 42.6% of all disagreement concentrates specifically at the hate/offensive boundary. This pattern indicates that annotators apply different thresholds for where hate begins, creating a structural issue in how ground truth is defined.

arxiv arXiv cs.CL · 5h ago

Structure-Preserving Document Translation via Multi-Stage LLM Pipeline: A Case Study in Marathi

This paper presents a framework for translating Marathi government documents to English that maintains layout fidelity and structural integrity, addressing limitations of existing systems that neglect formatting. The system integrates layout-aware OCR, coordinate-based text extraction, LLM translation, and HTML reconstruction to ensure spatial alignment and hierarchical consistency.

arxiv arXiv cs.CL · 5h ago

Categorizing Mathematical Concepts with LLM Voting Ensembles in Mathswitch

The open-source project Mathswitch imports mathematical concept records from sources like Wikidata and Wikipedia, linking records that refer to the same concept without reorganizing the original content. To address noise in the imported data, such as non-mathematical or ambiguous items, the authors test whether a voting ensemble of LLM judges can effectively filter this noise.

arxiv arXiv cs.CL · 5h ago

Labeling Training Data for Entity Matching Using Large Language Models

This paper investigates using large language models as teacher models in knowledge-distillation workflows to automatically label training data for smaller student models in entity matching tasks. The study evaluates various pair-selection strategies, teacher and student models, and post-processing methods across five standard benchmarks.

media Hugging Face Forums · 5h ago

AgentSeal: A Corpus-Availability Audit of SWE-bench Pro

The AgentSeal v5 audit tool evaluated the public availability of artifacts in the SWE-bench Pro benchmark to assess potential contamination risks. The study found that while 12 instances showed deterministic content overlap and 76 repositories were probable corpus members, most evidence consisted of date-unknown public replication rather than proven pre-cutoff contamination.

lab Google — The Keyword (AI) · 5h ago

Unlocking Britain’s next era of productivity: Building a nation of AI trailblazers

Google UK has released its latest Economic Impact Report detailing strategies to help more people unlock the benefits of AI-powered technologies in the country.

arxiv arXiv cs.CL · 6h ago

LAMP: Lean-based Agentic framework with MCP and Proof Repair

Researchers introduce LAMP, a multi-agent framework that synthesizes kernel-verified Lean 4 proofs for Combinatorics on Words by providing structured domain knowledge via an ontology. This approach addresses the lack of specialized lemmas in existing provers trained primarily on Mathlib data.

arxiv arXiv cs.CL · 6h ago

The Heterogeneous Safety Impacts of Benign Multilingual Fine-Tuning

A comprehensive empirical study reveals that fine-tuning large language models with benign multilingual data significantly increases their tendency to comply with unsafe adversarial prompts, a phenomenon termed multilingual safety drift. The research demonstrates that safety outcomes are highly sensitive to both the language used for fine-tuning and the language of evaluation, with compliance rates increasing four-fold in certain settings.

arxiv arXiv cs.CL · 6h ago

wav2VOT: Automatic estimation of voice onset time, closure duration, and burst realisation with wav2vec2

The article introduces wav2VOT, a tool for the automatic estimation of voice onset time, closure duration, and burst realisation that leverages the wav2vec2 model. It addresses the need for accurate speech annotation tools in phonetic research by demonstrating how large speech models can be applied to these specific tasks.

arxiv arXiv cs.CL · 6h ago

License Compatibility Analysis of Corpora for Low-Resource African Languages

This paper audits the license provenance of over twenty corpus families used in African NLP, revealing that while Creative Commons licenses dominate releases, their compatibility rules are rarely applied. The authors construct a six-tier compatibility matrix and apply it to three case-study languages: Kituba/Munukutuba, Zarma, and Moore.

arxiv arXiv cs.CL · 6h ago

Memory-Managed Long-Context Attention: A Preliminary Study of Editable Request-Local Memory

This study investigates memory-managed long-context attention by separating a fast recurrent or sparse backbone from explicit editable request-local memory slots and query-time sparse fallback. The research aims to address the limitations of existing linear, recurrent, and sparse attention methods in managing when facts should be written, overwritten, protected, or discarded.

arxiv arXiv cs.CL · 6h ago

PASTA: A Paraphrasing And Self-Training Approach for Knowledge Updating in LLMs

This paper introduces PASTA, a framework designed to integrate detailed factual information from news articles into Large Language Models (LLMs) to address the challenge of knowledge updating. The approach combines data augmentation, question-answering generation, and a novel self-learning Direct Preference Optimization (DPO) process to enable knowledge overwriting and hallucination suppression.

arxiv arXiv cs.CL · 6h ago

MedEvoEval: Evaluating Continual Evolution of Doctor Agents through Simulated Clinical Episodes

The authors introduce MedEvoEval, an executable longitudinal evaluation framework designed to assess the continual evolution of doctor agents through simulated outpatient clinical episodes. This system moves beyond static benchmarks by tracking how agents acquire evidence, utilize resources, and refine their decision-making across multiple interactions.