Research paper
arxiv arXiv cs.CL · 5h ago

BITEMBED: Extreme Low-Bit Framework for LLM-Based Text Embeddings

The paper introduces BITEMBED, an extreme low-bit framework designed to address the high deployment costs of LLM-based text embedders by targeting both encoding efficiency and vector storage. The method converts pretrained LLM backbones into BitNet-style encoders featuring ternary weights, quantized activations, and lightweight normalization refinement. To adapt these models for representation learning, BITEMBED employs continual contrastive pre-training followed by supervised contrastive fine-tuning. This fine-tuning process utilizes similarity-distribution distillation and attention-relation distillation from a full-precision teacher model. Beyond backbone quantization, the framework trains output embeddings to support multiple storage precisions, allowing for flexible trade-offs between performance and storage costs. Experiments on the MMTEB benchmark using Qwen3-0.6B and Gemma3-270M demonstrate that BITEMBED performs largely comparably to full-precision teacher embedders.

arxiv arXiv cs.CL · 6h ago

Space-Efficient Language Generation in the Limit

This study initiates a resource-aware theory of language generation in the limit under space efficiency constraints. A learner observes an adversarial positive stream from a target language K and must output a hallucination-free hypothesis L while omitting at most Δ strings. The research focuses on DFAs with s states over an alphabet of size k as the hypothesis class for memory-bounded learners. In the exponential-space regime, the authors prove that a learner can exactly identify the target language K. Under stricter memory budgets, they present a streaming algorithm using poly(s,k) space that converges to a hypothesis with a generation gap of Δ= O(k^{2s-2}). This learned hypothesis captures every string in K of length at least 2s-1. The results are complemented by a near-matching lower bound derived from communication complexity, showing that achieving Δ≤ k^{(1-ε)s} requires k^{Ω(εs)} memory. These findings reveal a sharp transition between polynomial-space generation and exponential-space exact identification.

arxiv arXiv cs.CL · 6h ago

SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment

Sparse Mixture-of-Experts (MoE) architectures often struggle with low-resource languages due to cross-lingual routing divergence that limits expert sharing. To address this, researchers propose SARA, a framework that transfers specialized capabilities from high-resource anchor languages to low-resource ones. SARA aligns the internal routing distributions of MoE layers using a symmetric Jensen-Shannon divergence constraint rather than operating on output logits. This approach encourages mechanistic consistency in expert selection across different languages. The authors evaluated the method on two large language models across five low-resource languages and three benchmarks. Results show SARA outperforms standard instruction tuning, achieving gains of +0.8% on Qwen3-30B-A3B and +1.2% on Phi-3.5-MoE-instruct for Global-MMLU. These findings demonstrate that SARA effectively addresses performance bottlenecks in low-resource contexts.

media r/LocalLLaMA · 9h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

arxiv arXiv cs.AI · 15h ago

SAFER: Reliable Test-Time Adaptation under Adversarial Streams

SAFER is a training-free framework that enhances robustness of test-time adaptation by using reliability-guided augmentation. It generates stochastic augmentations, pools predictions via correlation-weighted aggregation with outlier detection, and includes adaptive mixing to preserve clean performance under adversarial attacks. Evaluations on PACS, VLCS, and OfficeHome show improved resilience without sacrificing clean accuracy.

arxiv arXiv cs.AI · 15h ago

Sparsity-Storage-Accuracy Tradeoff in Parsimoniously Activated Dictionary Learning

Parsimoniously activated dictionary learning (PADL) establishes a structured generative model with auxiliary latent variables, enabling maximum a posteriori estimation. This framework provides generalization guarantees and an analytical characterization of the tradeoff between sparsity, storage cost, and reconstruction accuracy, allowing data-driven hyperparameter estimation. The resulting algorithm achieves better reconstruction performance and accelerates inference in vision-language models.

arxiv arXiv cs.AI · 15h ago

HyperAdapter: Structured Hyperedge Adaptation for Vision Transformer Fine-Tuning

HyperAdapter introduces a hypergraph-based adapter that performs structured, group-aware adaptation in vision transformers by operating in hyperedge space rather than token space. It uses prototype-based assignments to build a soft hypergraph, aggregates token features into hyperedge representations, applies lightweight adaptation, and diffuses updates back via hypergraph structure, enabling explicit structural inductive bias while maintaining efficiency. Experiments show consistent performance gains over baseline PEFT methods, especially on tasks requiring structured reasoning.

arxiv arXiv cs.AI · 15h ago

MetaPS: Adaptive Strategy Selection for Market Agents

MetaPS is a simulation-guided framework that enables market agents to adaptively select among programmatic strategies based on market states. It uses simulated markets to generate supervised training data, then selects strategies during inference to produce executable actions. Experiments show MetaPS outperforms fixed strategies and LLM-based agents, with compact models exceeding stronger API models in performance.

arxiv arXiv cs.AI · 15h ago

P4IR Framework Improves LLM-Based Code Compliance Accuracy

P4IR, a two-stage framework, uses supervised fine-tuning and Group Relative Policy Optimization to enhance large language model-based automated code compliance systems. It reduces tree edit and token-level Levenshtein distances by up to 23.8% and 38.6% respectively, outperforming leading LLMs like Claude Opus, GPT-5.2, and GLM-4.7 in zero-shot settings with few-shot prompting, and reduces false positives by a small but statistically significant margin.

arxiv arXiv cs.AI · 15h ago

Gold Points Sniper: Self-guided Visual Reasoning for Fine-grained Action Understanding

Gold Points Sniper (GPS) enables lightweight vision-language models to perform self-guided multimodal reasoning for fine-grained human action understanding. By integrating a Gold Points Extractor, Selective Socratic Questioner, and Semantic Entailment Evaluator, GPS achieves performance comparable to GPT-4o while maintaining superior factual accuracy on CAP benchmark-based instruction-tuning data.

arxiv arXiv cs.AI · 16h ago

DreamUV: End-to-End Flow Matching for Artist-like UV Unwrapping

DreamUV introduces an end-to-end learning framework that treats UV unwrapping as a generative Flow Matching problem. It learns a mesh-conditioned transport process to generate artist-like UV layouts, with boundary-aware training and Model-in-the-Loop fine-tuning to ensure seam geometry and practical validity. Results show straighter seams, tighter axis-aligned islands, and superior alignment with professional artist preferences.

arxiv arXiv cs.AI · 16h ago

Self-Evolving Cognitive Framework for Embodied Scientific Intelligence

The paper proposes a self-evolving cognitive framework that uses causal world modeling to enable embodied systems to continuously refine their internal models through interaction. It integrates causal modeling, intervention-driven reasoning, and continual refinement, redefining embodied interaction as an epistemic process for causal discovery and knowledge acquisition. The framework supports a shift from predictive to epistemic intelligence, with a new benchmark for evaluating self-evolving embodied scientific intelligence.