All articles — korshunov.ai

All articles Page 1 / 90

Asymptotic Signal Subspace Recovery in Softmax Attention Models

This study investigates the theoretical principles behind softmax-attention mechanisms by analyzing a stylized model where a query vector is learned via stochastic gradient ascent. The authors exploit the model's symmetry to derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. By employing tools from stochastic approximation and dynamical systems theory, they establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Under suitable high-dimensional scaling assumptions and standard step-size conditions, the research demonstrates that the learned query converges almost surely to the one-dimensional signal subspace. This convergence implies that the query asymptotically recovers the latent informative direction up to an intrinsic sign ambiguity. The findings provide a theoretical foundation for understanding attention as a signal extraction procedure in high-dimensional noisy environments.

arxiv arXiv cs.LG · 6h ago

QeHDC: Hyperdimensional Computing based on Quantum-enhanced binding and SuperClass Construction

The authors propose QeHDC, a novel framework extending classical Hyperdimensional Computing by leveraging quantum mechanical properties for enhanced computational efficiency. This approach utilizes a one-pass training method that employs sinusoidal and quantum encoding to project classical data into quantum amplitude states. A key innovation is the introduction of a reference-state-based quantum binding operation realized through specific quantum circuits. Additionally, the framework implements a density-matrix-based superclass generation strategy using eigenvalue decomposition to extract critical quantum state features. These mechanisms enable more accurate and robust class representations for classification tasks. Experimental evaluations on standard benchmark datasets demonstrate superior performance compared to traditional classical and existing quantum-enhanced methods. The results also highlight the approach's robustness to noise and computational feasibility, suggesting practical benefits for future quantum-inspired paradigms.

arxiv arXiv cs.LG · 7h ago

GaRA: Graph-aware LoRA Generation for Enhancing LLMs on Graph Tasks

Graph neural networks often exhibit limited transferability due to their tight coupling with dataset-specific feature spaces, whereas language models offer flexible generalization through a unified interface. Existing methods for adapting language models to graph tasks struggle to encode whole-graph information, which can lead to significant information loss and suboptimal understanding. To address this limitation, the authors propose GaRA, a novel Graph-aware LoRA generation model that implements a weight-level information injection paradigm. This approach generates task-specific weight updates conditioned on original graph structures, allowing them to interact directly with hidden representations. The method constrains the norm of these generated updates to inject whole-graph information while avoiding optimization bias inherent in standard weight generation. Empirical studies demonstrate that GaRA consistently outperforms baseline methods across various zero-shot graph learning tasks.

arxiv arXiv cs.LG · 7h ago

LLMs Determine Causal Structure via Difference-Making Logic

The article addresses the puzzle of how large language models acquire causal structure despite the limitations of standard formalisms like Judea Pearl's interventionist approach and the Neyman-Rubin framework. It argues that LLMs utilize a specific inductive method known as variational induction, which relies on difference-making logic. During training, models process vast amounts of text from diverse contexts to identify what constitutes a difference-maker or an indifference-maker within word sequences. The analysis examines how architectural components, specifically token embeddings and self-attention mechanisms, facilitate this variational induction process. This logical framework fundamentally parallels the experimental method used in science. In both cases, causal relations are derived by systematically varying individual circumstances to observe their influence on a phenomenon.

arxiv arXiv cs.LG · 7h ago

Escaping the Variance Trap: Jacobian-Free Dynamics for Root-Finding Bilevel Optimization

The authors identify a critical flaw termed the Variance Trap, which arises when stochastic root-finding problems are forced into minimization frameworks via squared residuals. Standard bilevel minimization algorithms require estimating hypergradients involving implicit Jacobians that act as noise amplifiers in stochastic settings. To address this, the paper formalizes Root-Finding Bilevel Optimization (RF-BO) as a distinct problem class to bypass this pathology. A Jacobian-free solution using Two-Time-Scale Stochastic Approximation (TTSA) is proposed to update directly along the root error. The study provides the first non-asymptotic convergence guarantees for TTSA in this setting under Markovian noise. Experiments show a 2.6% top-1 accuracy gain in SimCLR and 17x faster convergence in non-linear ODE control compared to baselines. Additionally, the framework achieves significantly improved entropy stability in reinforcement learning and an 11.1% quality improvement in generative modeling.

arxiv arXiv cs.LG · 7h ago

RQ-TTSA: Distribution-Aware Robust Bilevel Optimization with Quantile-Guided Huber Updates

The authors propose RQ-TTSA, a distribution-aware framework designed to address instability in bilevel optimization caused by heavy-tailed stochastic noise. Unlike existing variance-reduction techniques that rely on myopic magnitude checks, this method uses historical gradient buffers to estimate rolling quantiles for adaptive Huber-style clipping. This approach preserves local optimization geometry while strictly bounding effective variance under nonconvex-strongly convex assumptions with infinite-variance noise. Theoretical analysis derives a convergence rate of O(T^(-(p-1)/(3p-2))) that recovers optimal dependence on the heavy-tailed parameter p. Empirical evaluations across six diverse tasks, including vision benchmarks and offline reinforcement learning, show consistent outperformance over state-of-the-art baselines. RQ-TTSA eliminates divergence spikes and ensures stable convergence with negligible computational overhead of approximately 2.7 percent.

arxiv arXiv cs.LG · 7h ago

Deezer Deploys LLM-Based Music Playlist Captioning System

Deezer has deployed an automatic playlist captioning system powered by large language models to enhance its Daily Mix feature. This technology generates natural language descriptions for personalized playlists, helping users understand the content behind each recommendation. The system leverages recent advances in LLMs to process diverse data sources while maintaining strict control over output quality. It is now active for millions of users, significantly improving overall engagement metrics. The deployment highlights how semantic framing influences user perception in online personalized experiences. This initiative addresses the challenge of scaling playlist description generation effectively.

arxiv arXiv cs.LG · 7h ago

VRA-FedSGD: Variance-Reduced Federated Learning for Heavy-Tailed Noise

The authors propose VRA-FedSGD, a variance-reduction based algorithm designed for federated learning in environments with heavy-tailed gradient and communication noise. This approach addresses challenges prevalent in large-scale machine learning over wireless networks and Internet of Things deployments. The method employs momentum variance reduction combined with nonlinear mapping to mitigate heavy-tailed gradient noise. It also utilizes a variance-reduced aggregation mechanism to suppress heavy-tailed communication noise. For nonconvex objective functions, VRA-FedSGD achieves a mean convergence rate of O(K^(-(p-1)/(2p-1))), where p is the tail index. In the almost sure sense, it reaches a rate of Õ(K^(-(1-1/(p-ε))) for strongly convex objectives, with ε being an arbitrarily small constant. Simulated experiments on logistic regression with real-world data verify the algorithm's effectiveness.

media r/LocalLLaMA · 7h ago

GLM-5.2 on 4x DGX Spark: Reconstructing Missing Build Steps for MTP Speculative Decode

The author successfully deployed GLM-5.2 with MTP speculative decode on a cluster of four NVIDIA GB10 (DGX Spark) nodes, achieving approximately 9.4 tokens per second. This setup utilizes vLLM with tensor parallelism, ported sparse-MLA Triton kernels, and a deterministic 15% expert pruning to fit AWQ-INT4 weights. A critical finding is that the original Docker image build instructions are incomplete, requiring reconstruction of missing patches for deep_gemm.py and sparse_attn_indexer.py. The author also identified that using any vLLM version other than the specific pinned commit causes real AWQ weights to crash during loading due to CUDA errors. To replicate the environment, users must apply a custom script that bakes in kernels and routes functions to sm12x fallbacks. Performance benefits include roughly double the speed of previous llama.cpp implementations, though inter-node bandwidth remains a bottleneck for dual-rail scaling.

media r/LocalLLaMA · 7h ago

MINISFORUM DEG1 Oculink eGPU Dock Refurbished Available for $59

A refurbished MINISFORUM DEG1 Oculink eGPU dock is currently available for $59. The product listing highlights its robust build quality, noting that the device has sufficient heft to securely hold a graphics card. Unlike some lower-cost alternatives, this dock includes redrivers to ensure signal integrity. A user who purchased a unit last year reported positive experiences with its performance and stability. The item can be purchased directly from the manufacturer's refurbished product page.

media r/LocalLLaMA · 7h ago

Query on Clustering Nvidia DGX Spark and AMD Ryzen AI Max 395 for Unified Memory Inference

A user inquired about the feasibility of clustering a Nvidia DGX Spark with an AMD Ryzen AI Max 395 to run a single large language model. Both devices possess 128GB of unified memory, offering a potential combined capacity of approximately 256GB minus operating system overhead. The DGX Spark is equipped with a 200Gbit network interface, whereas the AMD Strix system currently has only 5Gbit Ethernet but includes a PCIe Gen 4x4 slot. The user noted that DeepSeek v4 Flash can fit on two DGX Sparks and wondered if the Strix could serve as an alternative node. To improve connectivity, they proposed adding a Mellanox ConnectX-6 QSFP+28 to the AMD system to achieve higher bandwidth over the link.

media r/LocalLLaMA · 8h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

media r/LocalLLaMA · 8h ago

User Observes Cloud Chatbots Appear Less Intelligent Than Local Models

A Reddit user reports that cloud chatbots like ChatGPT and Claude often seem less capable than open-source models such as Kimi or GLM when discussing abstract concepts. The author notes that these commercial models frequently leap to conclusions, oversimplify ideas, and rely on repetitive phrasing patterns. This perceived decline in intelligence is attributed to system prompts designed to enforce a specific personality for user engagement. While this behavior was particularly prominent during the GPT-4o era, it reportedly persists in current versions. The user questions whether accessing these models via raw API removes the restrictive system prompts or if they remain embedded. The post seeks community feedback on whether cloud models perform better without these constraints.

media r/LocalLLaMA · 8h ago

Gefen: A Drop-in Replacement for AdamW with Claimed 8x Memory Reduction

Gefen is presented as a drop-in replacement for the AdamW optimizer, claiming an eightfold reduction in memory usage during training. The project includes a GitHub repository available at ndvbd/Gefen and a corresponding research paper hosted on arXiv under the identifier 2606.13894. This submission highlights Gefen's potential to optimize resource efficiency for machine learning workflows. The provided source material links directly to the technical documentation and codebase for further verification. No additional performance metrics or comparative benchmarks are detailed in the available text.

media Hugging Face Forums · 8h ago

User Reports HuggingFace Charging for Unused L40S Compute in Spaces

A user on the Hugging Face discussion forum reported an issue where their Space remained stuck at the starting phase while using an L40S GPU. The user expressed frustration that they were being charged for compute resources despite the application failing to launch or utilize any actual processing power. This incident highlights concerns regarding billing transparency and infrastructure reliability within the platform's Spaces environment. The post serves as a complaint about financial loss due to technical failures rather than a feature announcement. No further technical details or official responses were included in the truncated source content.

media Hugging Face Forums · 8h ago

User Reports Step 3.7 Flash Model Tool Access Failure on HuggingChat

A user on the Hugging Face discussion forum reported that the Step 3.7 Flash model by StepFun AI has lost its ability to use tools, including MCP servers, as of the morning of the report. The individual expressed concern over whether this outage is temporary or permanent, noting their strong preference for this specific model due to its high performance and low resource costs compared to competitors. Despite praising the model's quality and affordability, the user highlighted the immediate disruption caused by the inability to execute tool-based functions. The post seeks clarification from the community regarding prior experiences with similar issues and potential resolutions. This incident underscores a critical dependency on tool availability for users relying on this specific AI configuration.

media Hugging Face Forums · 8h ago

Ontological Inversion: Flipping LLM Emotional Concepts via Negative Gain

The author introduces 'ontological inversion,' a technique designed to expand the one-directional inference nature of Large Language Models. This method allows models to capture nuanced, multifaceted concepts, such as memories that evoke both sorrow and joy simultaneously. The approach was developed by applying a negative gain factor during sweeps into the Niodoo steering architecture. It addresses the common limitation where LLMs overfit to singular emotional labels when prompted with personal experiences. By inverting concepts similarly to physics involution, the technique enables models to flip emotional states, such as transforming sorrowful memories into joyful ones. The work is shared via a GitHub repository titled 'ontological-inversion' by user Ruffian-L.

media Hugging Face Forums · 8h ago

User Inquires About Organization Rename Process on Hugging Face

A user posted on the Hugging Face discussion forum seeking assistance with renaming their organization. The individual stated they sent an email to website@huggingface.co on June 15 requesting a change from DZER-Studios to Vexion-LM. Despite sending the initial request, the user reported receiving no response and observed that the organization name remained unchanged. Consequently, the poster asked whether organization renames are still supported by the platform. They also requested guidance on alternative methods to contact the team regarding this specific administrative request.

media Hugging Face Forums · 8h ago

Community Inquiry on Model Benchmarking Methods

A user on the Hugging Face discussion forum posted a question seeking advice on how to benchmark machine learning models. The inquiry was initiated by an individual who is new to the field of fine-tuning and wishes to evaluate their models after completion. The post explicitly asks for established methods or strategies that the community uses for this purpose. It highlights a common need among practitioners to understand standard evaluation practices in model development. The discussion thread currently contains only one post from a single participant. No specific benchmarks, metrics, or technical solutions were provided within the visible content of the source.

media Hugging Face Forums · 8h ago

Qwen3/Gemma3 Candle Skips Attention Masks for Equal-Length Batches in CPU Mode

A user has reported a critical bug in the Hugging Face text-embeddings-inference library affecting Qwen3 and Gemma3 models. The issue arises when running inference on CPUs with concurrent requests, leading to significant accuracy degradation. Specifically, the Candle backend incorrectly skips attention masks for batches where all input sequences have equal lengths. This defect compromises the reliability of embeddings generated under these specific conditions. To address the problem, the author submitted a pull request containing a fix that was thoroughly tested on their local machines. The bug highlights potential stability risks in CPU-based embedding services handling batched inputs.