All articles — korshunov.ai

All articles Page 1 / 103

When you don't have a data center GPU

The article references the LiquidAI LFM2.5-230M model as an alternative for users without access to data center GPUs.

Ornith-1.0: Open-source LLMs for agentic coding

Ornith-1.0 is a new family of open-source large language models specialized for agentic coding tasks. The model family spans multiple parameter sizes, including 9B Dense, 35B MoE, and 397B MoE configurations.

arxiv arXiv cs.CL · 5h ago

Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context

NVIDIA introduces Nemotron-TwoTower, a diffusion language model that decouples context representation and iterative denoising into two separate networks to overcome capacity limitations in existing approaches. Built on the open-weight Nemotron-3-Nano-30B-A3B model and trained on 2.1T tokens, it retains 98.7% of the autoregressive baseline's quality while achieving 2.42X higher wall-clock generation throughput.

arxiv arXiv cs.CL · 5h ago

Humans Disengage, Reasoning Models Persist: Separating Difficulty Registration from Deliberation Allocation

A study reveals that while large reasoning models (LRMs) and humans both spend more time on harder problems, they diverge significantly in how they allocate deliberation within specific items. When making errors, LRMs generate more tokens than when correct, whereas humans do the opposite, spending less time on trials they get wrong.

arxiv arXiv cs.CL · 5h ago

MemStrata: Eliminating Stale-Fact Errors in RAG Agents via Temporal Validity

The article introduces MemStrata, a retrieval memory system designed to eliminate stale-fact errors in AI agents by maintaining temporal validity within accumulated knowledge. Unlike standard Retrieval-Augmented Generation (RAG), which struggles to distinguish between duplicated and contradicted facts due to embedding similarity, MemStrata uses a deterministic supersession rule to retire outdated information.

arxiv arXiv cs.CL · 5h ago

Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

The authors propose Erase-then-Delta Attention (EDA), a memory update rule for recurrent models that decouples the address used to erase stale information from the address used to write new content. This approach addresses the limitation of delta-rule linear attention, which cannot actively remove outdated data stored at different locations before writing.

arxiv arXiv cs.CL · 5h ago

The Inattentional Gap: Task-Conditioned Models Omit Safety Signals

A study reveals that conditioning language and vision models on narrow tasks suppresses their ability to report co-present, safety-critical signals they can otherwise detect. This phenomenon, termed the "Inattentional Gap," demonstrates a dissociation between measured benchmark safety and real-world safety.

arxiv arXiv cs.CL · 5h ago

DiARC: Distinguishing Positive and Negative Samples Helps Improving ARC-like Reasoning Ability of Large Language Models

The paper introduces DiARC, a method that improves the abstract reasoning capabilities of large language models by incorporating negative sample supervision alongside positive examples. This approach addresses the limitations of current methods that rely heavily on data augmentation or expensive closed-source models.

arxiv arXiv cs.CL · 5h ago

Compiler-Driven Approximation Tuning for Hyperdimensional Computing

The authors introduce ApproxHDC, a framework that automates the identification and application of domain-specific approximations in Hyperdimensional Computing (HDC) workloads. This system extends the HPVM-HDC compiler infrastructure to enable retargetable compilation across diverse hardware backends, including CPUs, GPUs, and simulated ReRAM and PCM accelerators.

arxiv arXiv cs.CL · 5h ago

Adversarial Diffusion Across Modalities: A Fusion Survey of Attacks, Defenses, and Evaluation

This survey integrates four disconnected tracks of adversarial evaluation—diffusion-based attacks on text and LLMs, image classifiers, vision-language models, and input purification defenses—into a single conceptual framework. It focuses on the LLM-side slice to unify vocabulary, threat models, and benchmarks around denoising diffusion as a shared generative mechanism.

arxiv arXiv cs.CL · 5h ago

Zero-shot Tweet-Level Stance Detection Enhanced by External Knowledge and Reflective Chain-of-Thought Reasoning

Researchers propose KIRP, a zero-shot stance detection framework that addresses context sparsity and implicit target relevance in short texts by integrating external knowledge with reflective Chain-of-Thought reasoning. The study also introduces the first Japanese tweet-level dataset for stance detection to support this multi-topic evaluation.

arxiv arXiv cs.CL · 5h ago

Closing the Quality Gap in Low-Resource Text-to-Speech: LoRA Fine-Tuning of VoxCPM2 for Khmer and Korean

Researchers address the quality gap in low-resource text-to-speech by fine-tuning the 2.4B-parameter VoxCPM2 model using Low-Rank Adaptation (LoRA) on a shared corpus of Khmer and Korean.

arxiv arXiv cs.CL · 5h ago

SAE-Guided Activation Regularization for LLM Continual Learning

This paper proposes a new approach to catastrophic forgetting in large language models by regularizing in activation space using pretrained Sparse Autoencoders (SAEs) as a monosemantic feature dictionary, rather than relying on traditional weight-space methods like Elastic Weight Consolidation (EWC).

arxiv arXiv cs.CL · 5h ago

CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs

Researchers present CAT-Q, a post-training quantization scheme that compresses large language models into ternary precision without requiring costly quantization-aware training. The method utilizes learnable modulation and softened ternarization to achieve high accuracy using only 512 calibration samples.

media Hugging Face Forums · 5h ago

Experience with dissimilar language ablation?

A user asks for experience regarding the ablation of Mandarin, Russian, and Arabic from a model to create a primarily Latin-based version. The goal is to free up space for further training or safe pruning in contexts where English has no activation.

arxiv arXiv cs.CL · 7h ago

SocialPersona: Benchmarking Personalized Profiling and Response with Multimodal Social-Media Context

The authors introduce SocialPersona, a benchmark designed to evaluate whether multimodal large language models (MLLMs) can recover revealed preferences from longitudinal social-media timelines and use them in dialogue. This work addresses the limitation of current evaluations that focus only on explicit memory by testing a model's ability to infer interests from natural multimodal traces.

arxiv arXiv cs.CL · 7h ago

LeanGuard: A Fast and Light Approach for Robust Moderation

This paper investigates whether safety guardrails actually require chain-of-thought reasoning by training a lightweight bidirectional encoder alongside a reasoning-based guard on the same corpus. The authors find that removing reasoning does not improve moderation accuracy, challenging the common belief that step-by-step thinking is necessary for effective moderation.

arxiv arXiv cs.CL · 7h ago

Beyond Logical Forms: LLM-Extracted Patterns for Fallacy Classification

This study investigates whether merging abstract logical structures with context-level linguistic cues improves the automated classification of logical fallacies, which often appear in nuanced forms.

arxiv arXiv cs.CL · 7h ago

HyperDFlash: MHC-Aligned Block Speculative Decoding with Gated Residual Reduction

HyperDFlash is a block-parallel speculative decoding framework designed to address feature misalignment issues when adapting DFlash to DeepSeek-V4's multi-hyper-connection (MHC) architecture. The authors propose two key optimizations: using pre-collapse residual states for conditioning and replacing the generic linear compressor with a lightweight gated residual reducer inherited from the model's hyper-connection head.

arxiv arXiv cs.CL · 7h ago

Structure Before Collapse: Transient semantic geometry in next-token prediction

This article investigates how language models learn latent semantic structure despite being trained with one-hot labels that theoretically eliminate shared context statistics. The authors identify a tension between Neural Collapse theory and the observed ability of models to capture categorical features like object properties.