Research paper — korshunov.ai

Research paper Page 1 / 16

BITEMBED: Extreme Low-Bit Framework for LLM-Based Text Embeddings

The paper introduces BITEMBED, an extreme low-bit framework designed to address the high deployment costs of LLM-based text embedders by targeting both encoding efficiency and vector storage. The method converts pretrained LLM backbones into BitNet-style encoders featuring ternary weights, quantized activations, and lightweight normalization refinement. To adapt these models for representation learning, BITEMBED employs continual contrastive pre-training followed by supervised contrastive fine-tuning. This fine-tuning process utilizes similarity-distribution distillation and attention-relation distillation from a full-precision teacher model. Beyond backbone quantization, the framework trains output embeddings to support multiple storage precisions, allowing for flexible trade-offs between performance and storage costs. Experiments on the MMTEB benchmark using Qwen3-0.6B and Gemma3-270M demonstrate that BITEMBED performs largely comparably to full-precision teacher embedders.

arxiv arXiv cs.CL · 6h ago

Space-Efficient Language Generation in the Limit

This study initiates a resource-aware theory of language generation in the limit under space efficiency constraints. A learner observes an adversarial positive stream from a target language K and must output a hallucination-free hypothesis L while omitting at most Δ strings. The research focuses on DFAs with s states over an alphabet of size k as the hypothesis class for memory-bounded learners. In the exponential-space regime, the authors prove that a learner can exactly identify the target language K. Under stricter memory budgets, they present a streaming algorithm using poly(s,k) space that converges to a hypothesis with a generation gap of Δ= O(k^{2s-2}). This learned hypothesis captures every string in K of length at least 2s-1. The results are complemented by a near-matching lower bound derived from communication complexity, showing that achieving Δ≤ k^{(1-ε)s} requires k^{Ω(εs)} memory. These findings reveal a sharp transition between polynomial-space generation and exponential-space exact identification.

arxiv arXiv cs.CL · 6h ago

SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment

Sparse Mixture-of-Experts (MoE) architectures often struggle with low-resource languages due to cross-lingual routing divergence that limits expert sharing. To address this, researchers propose SARA, a framework that transfers specialized capabilities from high-resource anchor languages to low-resource ones. SARA aligns the internal routing distributions of MoE layers using a symmetric Jensen-Shannon divergence constraint rather than operating on output logits. This approach encourages mechanistic consistency in expert selection across different languages. The authors evaluated the method on two large language models across five low-resource languages and three benchmarks. Results show SARA outperforms standard instruction tuning, achieving gains of +0.8% on Qwen3-30B-A3B and +1.2% on Phi-3.5-MoE-instruct for Global-MMLU. These findings demonstrate that SARA effectively addresses performance bottlenecks in low-resource contexts.

media r/LocalLLaMA · 9h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

arxiv arXiv cs.AI · 14h ago

Hypothesis-Driven Skill Optimization for LLM Agents

HDSO enables safe, auditable skill updates for LLM agents without training, using falsifiable hypotheses and validation. On ALFWorld, it improves Qwen3-8B by +6.9 Avg. SR points and maintains a +7.1-point gain under noisy feedback, with validated skills transferable across runs and models when diagnostic alignment is achieved.

arxiv arXiv cs.AI · 15h ago

Flow Annealing Posterior Sampling for Function-Space Regression and Inverse Problems

FAPS is the first function-space posterior sampling framework that unifies stochastic-process regression and PDE inverse problems. It uses pretrained flow-matching priors and Langevin correction with low-rank covariance preconditioning to enable efficient, accurate posterior inference from sparse, noisy data with coherent uncertainty quantification.

arxiv arXiv cs.AI · 15h ago

Select-to-Act: Hierarchical RL with Adaptive Language Guidance

HRLLI introduces a hierarchical reinforcement learning framework that adapts natural-language instructions dynamically during decision-making. It decomposes instructions into stage-specific guidance elements and uses a select-to-act paradigm to enable real-time selection of relevant instruction pieces, improving sample efficiency and performance in complex environments.

arxiv arXiv cs.AI · 15h ago

SAFER: Reliable Test-Time Adaptation under Adversarial Streams

SAFER is a training-free framework that enhances robustness of test-time adaptation by using reliability-guided augmentation. It generates stochastic augmentations, pools predictions via correlation-weighted aggregation with outlier detection, and includes adaptive mixing to preserve clean performance under adversarial attacks. Evaluations on PACS, VLCS, and OfficeHome show improved resilience without sacrificing clean accuracy.

arxiv arXiv cs.AI · 15h ago

Sparsity-Storage-Accuracy Tradeoff in Parsimoniously Activated Dictionary Learning

Parsimoniously activated dictionary learning (PADL) establishes a structured generative model with auxiliary latent variables, enabling maximum a posteriori estimation. This framework provides generalization guarantees and an analytical characterization of the tradeoff between sparsity, storage cost, and reconstruction accuracy, allowing data-driven hyperparameter estimation. The resulting algorithm achieves better reconstruction performance and accelerates inference in vision-language models.

arxiv arXiv cs.AI · 15h ago

First-Token Broadcasters in Transformers: Language Identity and Robustness

LIHA reveals a small set of first-token broadcaster heads in GPT-2 that persistently attend to the initial prompt token, driving language switches. Instruction tuning reorganizes these circuits, concentrating language identity at early layers, as seen in Qwen2.5-1.5B-Instruct and confirmed in Chinese and Russian language handling at layer 0.

arxiv arXiv cs.AI · 15h ago

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning

ARIA addresses contextual tunneling in LLMs by conditioning knowledge use on mechanistic completeness. It uses a three-tier cascade for causal reasoning, physics-informed transfer, and parametric fallback, and improves materials discovery through auditable, physically grounded reasoning.

arxiv arXiv cs.AI · 15h ago

HyperAdapter: Structured Hyperedge Adaptation for Vision Transformer Fine-Tuning

HyperAdapter introduces a hypergraph-based adapter that performs structured, group-aware adaptation in vision transformers by operating in hyperedge space rather than token space. It uses prototype-based assignments to build a soft hypergraph, aggregates token features into hyperedge representations, applies lightweight adaptation, and diffuses updates back via hypergraph structure, enabling explicit structural inductive bias while maintaining efficiency. Experiments show consistent performance gains over baseline PEFT methods, especially on tasks requiring structured reasoning.

arxiv arXiv cs.AI · 15h ago

MetaPS: Adaptive Strategy Selection for Market Agents

MetaPS is a simulation-guided framework that enables market agents to adaptively select among programmatic strategies based on market states. It uses simulated markets to generate supervised training data, then selects strategies during inference to produce executable actions. Experiments show MetaPS outperforms fixed strategies and LLM-based agents, with compact models exceeding stronger API models in performance.

arxiv arXiv cs.AI · 15h ago

P4IR Framework Improves LLM-Based Code Compliance Accuracy

P4IR, a two-stage framework, uses supervised fine-tuning and Group Relative Policy Optimization to enhance large language model-based automated code compliance systems. It reduces tree edit and token-level Levenshtein distances by up to 23.8% and 38.6% respectively, outperforming leading LLMs like Claude Opus, GPT-5.2, and GLM-4.7 in zero-shot settings with few-shot prompting, and reduces false positives by a small but statistically significant margin.

arxiv arXiv cs.AI · 15h ago

Gold Points Sniper: Self-guided Visual Reasoning for Fine-grained Action Understanding

Gold Points Sniper (GPS) enables lightweight vision-language models to perform self-guided multimodal reasoning for fine-grained human action understanding. By integrating a Gold Points Extractor, Selective Socratic Questioner, and Semantic Entailment Evaluator, GPS achieves performance comparable to GPT-4o while maintaining superior factual accuracy on CAP benchmark-based instruction-tuning data.

lab Hugging Face Blog · 16h ago

NVIDIA NeMo AutoModel Speeds Up Transformer Fine-Tuning

NVIDIA's NeMo AutoModel enables faster fine-tuning of transformer models by automating model selection and optimization. It reduces development time and improves efficiency in training large language models on NVIDIA hardware.

arxiv arXiv cs.AI · 16h ago

LLMs Use Difference-Making Logic to Learn Causal Structure

Large language models learn causal structure through a difference-making logic, akin to the experimental method. This approach identifies which word sequences influence outcomes and which do not, using vast text data during training. Architectural features like token embeddings and self-attention support this inductive process by detecting patterns of variation and indifference in language.

arxiv arXiv cs.AI · 16h ago

DreamUV: End-to-End Flow Matching for Artist-like UV Unwrapping

DreamUV introduces an end-to-end learning framework that treats UV unwrapping as a generative Flow Matching problem. It learns a mesh-conditioned transport process to generate artist-like UV layouts, with boundary-aware training and Model-in-the-Loop fine-tuning to ensure seam geometry and practical validity. Results show straighter seams, tighter axis-aligned islands, and superior alignment with professional artist preferences.

arxiv arXiv cs.AI · 16h ago

A Differentiable Atari VCS for Explainable AI

A fully differentiable emulator of the Atari 2600 VCS is presented, reproducing all 64 ALE games with bit-for-bit accuracy in RAM and screen output. The system enables gradient-based explainable AI by providing a complex, fully known ground truth, with both Julia and JAX implementations validated against a reference emulator and capable of high-throughput differentiable rollouts on GPU.

arxiv arXiv cs.AI · 16h ago

Self-Evolving Cognitive Framework for Embodied Scientific Intelligence

The paper proposes a self-evolving cognitive framework that uses causal world modeling to enable embodied systems to continuously refine their internal models through interaction. It integrates causal modeling, intervention-driven reasoning, and continual refinement, redefining embodied interaction as an epistemic process for causal discovery and knowledge acquisition. The framework supports a shift from predictive to epistemic intelligence, with a new benchmark for evaluating self-evolving embodied scientific intelligence.