All articles — korshunov.ai

All articles Page 1 / 102

CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs

Researchers present CAT-Q, a post-training quantization scheme that compresses large language models into ternary precision without requiring costly quantization-aware training. The method utilizes learnable modulation and softened ternarization to achieve high accuracy using only 512 calibration samples.

media Hugging Face Forums · 3h ago

Experience with dissimilar language ablation?

A user asks for experience regarding the ablation of Mandarin, Russian, and Arabic from a model to create a primarily Latin-based version. The goal is to free up space for further training or safe pruning in contexts where English has no activation.

arxiv arXiv cs.CL · 4h ago

SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context

The authors introduce SocialPersona, a benchmark designed to evaluate whether multimodal large language models (MLLMs) can recover revealed preferences from longitudinal social-media timelines and use them in dialogue. This work addresses the limitation of current evaluations that focus only on explicit memory by testing a model's ability to infer interests from natural multimodal traces.

arxiv arXiv cs.CL · 4h ago

LeanGuard: A Fast and Light Approach for Robust Moderation

This paper investigates whether safety guardrails actually require chain-of-thought reasoning by training a lightweight bidirectional encoder alongside a reasoning-based guard on the same corpus. The authors find that removing reasoning does not improve moderation accuracy, challenging the common belief that step-by-step thinking is necessary for effective moderation.

arxiv arXiv cs.CL · 4h ago

Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification

This study investigates whether merging abstract logical structures with context-level linguistic cues improves the automated classification of logical fallacies, which often appear in nuanced forms.

arxiv arXiv cs.CL · 4h ago

HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction

HyperDFlash is a block-parallel speculative decoding framework designed to address feature misalignment issues when adapting DFlash to DeepSeek-V4's multi-hyper-connection (MHC) architecture. The authors propose two key optimizations: using pre-collapse residual states for conditioning and replacing the generic linear compressor with a lightweight gated residual reducer inherited from the model's hyper-connection head.

arxiv arXiv cs.CL · 4h ago

Structure Before Collapse: Transient semantic geometry in next-token prediction

This article investigates how language models learn latent semantic structure despite being trained with one-hot labels that theoretically eliminate shared context statistics. The authors identify a tension between Neural Collapse theory and the observed ability of models to capture categorical features like object properties.

arxiv arXiv cs.CL · 4h ago

ConvMemory v3 introduces a validity context layer for conversational memory

ConvMemory v3 adds a validity context layer to detect and surface update evidence in retrieved memories through target-conditioned relation verification. This mechanism sits after the standard retrieval path and uses a dual-evidence gate to score (target, source) pairs based on specific propositions.

arxiv arXiv cs.CL · 4h ago

Evaluation Pitfalls and Challenges in Multimedia Event Extraction

This work presents the first systematic analysis of evaluation pitfalls in multimedia event extraction, identifying three major sources of issues: inconsistent data processing, inconsistent task assumptions, and overly relaxed evaluation settings.

arxiv arXiv cs.CL · 4h ago

Reproducibility Study of AlphaEdit: Null-Space Constrained Knowledge Editing

This study reproduces the results of AlphaEdit, a null-space constrained projection method for knowledge editing in language models, and extends the evaluation to newer architectures and longer sequential editing horizons. The authors confirm that AlphaEdit performs as reported within its original scope but identify significant limitations regarding generalization and scalability.

arxiv arXiv cs.CL · 4h ago

AIGP: An LLM-Based Framework for Long-Term Value Alignment in E-Commerce Pricing

Researchers propose AIGP, a framework using Large Language Models to address interpretability and long-term objective misalignment in e-commerce dynamic pricing. The system employs supervised fine-tuning and a Long-Term Value Estimator trained via offline reinforcement learning to align pricing decisions with business goals.

arxiv arXiv cs.CL · 4h ago

OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning

The authors propose OPID, a framework that extracts skill supervision directly from completed on-policy trajectories to address the sparse reward problem in outcome-based reinforcement learning. By representing trajectory hindsight as hierarchical skills, OPID provides dense, distribution-matched token-level supervision without relying on external memory.

arxiv arXiv cs.CL · 5h ago

Computational Study of Lexical Transmission Across Bengali Devotional Traditions

A computational corpus study analyzes vocabulary relationships across eight layers of Bengali and Sanskrit devotional literature from the 8th to 19th centuries, quantifying the historical claim that Buddhist Vajrayana vocabulary was absorbed into the Shakta Tantra tradition. Using TF-IDF character n-gram vectorization on 75 texts, the research provides the first quantitative corroboration of this lexical transmission chain.

arxiv arXiv cs.CL · 5h ago

KARLA: Knowledge-base Augmented Retrieval for Language Models

The authors propose KARLA, a method enabling large language models to automatically retrieve factual knowledge from an external knowledge base during token generation. This approach allows factual updates without retraining the model and ensures that outputs are traceable to the source data.

arxiv arXiv cs.CL · 5h ago

FBK's Long-form SpeechLLMs for IWSLT 2026 Instruction Following

This paper details FBK's submission to the IWSLT 2026 Instruction Following shared task, presenting SpeechLLMs designed for both short-form and long-form speech instruction following under constrained settings.

arxiv arXiv cs.CL · 5h ago

AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems

AgentX is a production-deployed multi-agent system designed to automate the iteration of industrial recommender systems, addressing the bottleneck where innovation currently scales linearly with human headcount.

arxiv arXiv cs.CL · 5h ago

Cascaded Multi-Granularity Pruning for On-Device LLM Inference in Industrial IoT

This article introduces a cascaded multi-granularity pruning framework designed to deploy large language models on Industrial Internet of Things (IIoT) edge devices by removing layers, attention heads, and feed-forward channels in a coarse-to-fine order. The method utilizes lightweight low-rank recovery between stages to re-estimate component importance, addressing the collapse of existing structured pruning methods at high compression ratios.

arxiv arXiv cs.CL · 5h ago

InfoKV: Information-Aware KV Cache Compression for Long Reasoning

Researchers introduce InfoKV, an entropy-aware framework that compresses key-value caches by combining token-level predictive uncertainty with attention scores to improve long-context reasoning.

arxiv arXiv cs.CL · 5h ago

Heterogeneous Neural Predictivity from Language Models During Naturalistic Comprehension

This study demonstrates that frozen language models can serve as effective neural predictors for brain activity during natural speech and text comprehension, while distinguishing predictive utility from claims about shared neural organization. The analysis of MEG and ECoG data revealed widespread positive prediction gains over low-level baselines, though participant-level advantages were localized rather than uniform.

arxiv arXiv cs.CL · 5h ago

SamaVaani: Auditing and Debiasing Multilingual Clinical ASR for Indian Languages

This study audits the reliability of eight state-of-the-art Automatic Speech Recognition models on real-world psychiatric interview data in Kannada, Hindi, and Indian English. The results reveal substantial variability across models and languages, with some systems performing competitively in Indian English but failing in regional speech.