All articles — korshunov.ai

All articles Page 1 / 90

HyperAdapter: Structured Hyperedge Adaptation for Parameter-Efficient Fine-Tuning of Vision Transformers

The authors propose HyperAdapter, a novel parameter-efficient fine-tuning method that adapts vision transformers in hyperedge space rather than token space. Existing adapter-based methods typically perform independent adaptations for each token, which overlooks structured relationships and can lead to redundant updates. HyperAdapter constructs a soft hypergraph over ViT tokens using prototype-based assignments to enable group-aware adaptation. The architecture aggregates token features into latent hyperedge representations and applies lightweight bottleneck adaptation at the hyperedge level. Updates are then diffused back to individual tokens via the hypergraph incidence structure, injecting an explicit structural inductive bias. Extensive experiments across diverse visual benchmarks demonstrate that this approach consistently outperforms strong PEFT baselines under comparable parameter budgets. The results highlight significant gains on tasks requiring structured reasoning and suggest that the choice of adaptation space is a critical dimension for efficient transfer.

arxiv arXiv cs.LG · 7h ago

Shift-Invariant Variance Estimator Eliminates Minimization Bias in Local Learning Coefficient Estimation

Singular Learning Theory uses the Local Learning Coefficient to quantify neural network loss landscape geometry, but mean-energy estimators rely on an additive loss baseline. During off-equilibrium training phases, this minimum is unknown, and substituting it with noisy mini-batch losses introduces systematic minimization bias. The authors propose the Shift-Invariant Variance Estimator (SIVE) to structurally eliminate this unknown baseline through the variance operator. By combining SIVE with a correction derived from the Law of Total Variance, the method separates geometric loss fluctuations from evaluation noise. Controlled experiments on analytically tractable toy models demonstrate that SIVE recovers expected finite-temperature geometric signals where anchored mean estimators fail. Applied to deep neural networks, SIVE serves as a robust diagnostic for tracking structural phase transitions throughout training.

arxiv arXiv cs.LG · 7h ago

Efficient CNN with Transfer Learning for Multi-Cancer Detection

A study introduces a lightweight convolutional neural network enhanced with transfer learning for multi-cancer detection using biomedical images. The architecture aims to reduce computational complexity while maintaining high classification performance for deployment in resource-constrained environments. Researchers evaluated the model on three tumor datasets comprising brain MRI and lung and kidney CT scans. The system achieved test accuracies of 90.85%, 98.64%, and 99.92% for brain, lung, and kidney cancer respectively via five-fold stratified cross-validation. Transfer learning was employed by pretraining on one cancer type and fine-tuning on others, requiring only 20 additional epochs to match scratch-trained models. The fine-tuning process updates the classification part of the CNN and takes approximately 0.014 seconds per image per epoch on an NVIDIA GeForce GTX 960. Comparative evaluations demonstrate that this model outperforms state-of-the-art architectures such as Xception, VGG16, VGG19, MobileNetV2, and DenseNet121.

blog Simon Willison · 7h ago

Simon Willison converts MDN browser compatibility data into a SQLite database

Inspired by Mozilla's new MDN MCP service, Simon Willison has converted the comprehensive mdn/browser-compat-data repository into a SQLite database. The project utilizes a script generated by Claude Code for web (Opus 4.8) to perform this conversion using sqlite-utils. The resulting database is approximately 66MB in size and is hosted on GitHub with open CORS headers to facilitate direct access. To automate the process, a GitHub Actions workflow was built using Codex Desktop (GPT-5.5) to force-push the updated database to an orphan branch named db. Users can download the final browser-compat.db file directly from the repository or explore its contents via Datasette Lite.

arxiv arXiv cs.LG · 8h ago

P4IR: Reinforcement Learning Enhances Automated Code Compliance Systems

A new framework named P4IR addresses the issue of hallucinated rules in large language model-based automated code compliance systems. This two-stage approach first employs supervised fine-tuning to instill domain knowledge into the model. It then utilizes Group Relative Policy Optimization to improve the accuracy of generated high-level code skeletons. The method achieved reductions of up to 23.8% in tree edit distance and 38.6% in token-level Levenshtein distance compared to supervised fine-tuning baselines. Comparative analysis shows that P4IR outperforms leading models like Claude Opus, GPT-5.2, and Qwen-3-Max in zero-shot settings. Additionally, the reinforcement learning stage produced a statistically significant reduction in false positives. This combination of techniques offers a path toward more reliable automated code compliance.

arxiv arXiv cs.LG · 8h ago

Asymptotic Signal Subspace Recovery in Softmax Attention Models

This study investigates the theoretical principles behind softmax-attention mechanisms by analyzing a stylized model where a query vector is learned via stochastic gradient ascent. The authors exploit the model's symmetry to derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. By employing tools from stochastic approximation and dynamical systems theory, they establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Under suitable high-dimensional scaling assumptions and standard step-size conditions, the research demonstrates that the learned query converges almost surely to the one-dimensional signal subspace. This convergence implies that the query asymptotically recovers the latent informative direction up to an intrinsic sign ambiguity. The findings provide a theoretical foundation for understanding attention as a signal extraction procedure in high-dimensional noisy environments.

arxiv arXiv cs.LG · 8h ago

QeHDC: Hyperdimensional Computing based on Quantum-enhanced binding and SuperClass Construction

The authors propose QeHDC, a novel framework extending classical Hyperdimensional Computing by leveraging quantum mechanical properties for enhanced computational efficiency. This approach utilizes a one-pass training method that employs sinusoidal and quantum encoding to project classical data into quantum amplitude states. A key innovation is the introduction of a reference-state-based quantum binding operation realized through specific quantum circuits. Additionally, the framework implements a density-matrix-based superclass generation strategy using eigenvalue decomposition to extract critical quantum state features. These mechanisms enable more accurate and robust class representations for classification tasks. Experimental evaluations on standard benchmark datasets demonstrate superior performance compared to traditional classical and existing quantum-enhanced methods. The results also highlight the approach's robustness to noise and computational feasibility, suggesting practical benefits for future quantum-inspired paradigms.

arxiv arXiv cs.LG · 8h ago

GaRA: Graph-aware LoRA Generation for Enhancing LLMs on Graph Tasks

Graph neural networks often exhibit limited transferability due to their tight coupling with dataset-specific feature spaces, whereas language models offer flexible generalization through a unified interface. Existing methods for adapting language models to graph tasks struggle to encode whole-graph information, which can lead to significant information loss and suboptimal understanding. To address this limitation, the authors propose GaRA, a novel Graph-aware LoRA generation model that implements a weight-level information injection paradigm. This approach generates task-specific weight updates conditioned on original graph structures, allowing them to interact directly with hidden representations. The method constrains the norm of these generated updates to inject whole-graph information while avoiding optimization bias inherent in standard weight generation. Empirical studies demonstrate that GaRA consistently outperforms baseline methods across various zero-shot graph learning tasks.

arxiv arXiv cs.LG · 8h ago

LLMs Determine Causal Structure via Difference-Making Logic

The article addresses the puzzle of how large language models acquire causal structure despite the limitations of standard formalisms like Judea Pearl's interventionist approach and the Neyman-Rubin framework. It argues that LLMs utilize a specific inductive method known as variational induction, which relies on difference-making logic. During training, models process vast amounts of text from diverse contexts to identify what constitutes a difference-maker or an indifference-maker within word sequences. The analysis examines how architectural components, specifically token embeddings and self-attention mechanisms, facilitate this variational induction process. This logical framework fundamentally parallels the experimental method used in science. In both cases, causal relations are derived by systematically varying individual circumstances to observe their influence on a phenomenon.

arxiv arXiv cs.LG · 8h ago

Escaping the Variance Trap: Jacobian-Free Dynamics for Root-Finding Bilevel Optimization

The authors identify a critical flaw termed the Variance Trap, which arises when stochastic root-finding problems are forced into minimization frameworks via squared residuals. Standard bilevel minimization algorithms require estimating hypergradients involving implicit Jacobians that act as noise amplifiers in stochastic settings. To address this, the paper formalizes Root-Finding Bilevel Optimization (RF-BO) as a distinct problem class to bypass this pathology. A Jacobian-free solution using Two-Time-Scale Stochastic Approximation (TTSA) is proposed to update directly along the root error. The study provides the first non-asymptotic convergence guarantees for TTSA in this setting under Markovian noise. Experiments show a 2.6% top-1 accuracy gain in SimCLR and 17x faster convergence in non-linear ODE control compared to baselines. Additionally, the framework achieves significantly improved entropy stability in reinforcement learning and an 11.1% quality improvement in generative modeling.

arxiv arXiv cs.LG · 8h ago

RQ-TTSA: Distribution-Aware Robust Bilevel Optimization with Quantile-Guided Huber Updates

The authors propose RQ-TTSA, a distribution-aware framework designed to address instability in bilevel optimization caused by heavy-tailed stochastic noise. Unlike existing variance-reduction techniques that rely on myopic magnitude checks, this method uses historical gradient buffers to estimate rolling quantiles for adaptive Huber-style clipping. This approach preserves local optimization geometry while strictly bounding effective variance under nonconvex-strongly convex assumptions with infinite-variance noise. Theoretical analysis derives a convergence rate of O(T^(-(p-1)/(3p-2))) that recovers optimal dependence on the heavy-tailed parameter p. Empirical evaluations across six diverse tasks, including vision benchmarks and offline reinforcement learning, show consistent outperformance over state-of-the-art baselines. RQ-TTSA eliminates divergence spikes and ensures stable convergence with negligible computational overhead of approximately 2.7 percent.

arxiv arXiv cs.LG · 8h ago

Deezer Deploys LLM-Based Music Playlist Captioning System

Deezer has deployed an automatic playlist captioning system powered by large language models to enhance its Daily Mix feature. This technology generates natural language descriptions for personalized playlists, helping users understand the content behind each recommendation. The system leverages recent advances in LLMs to process diverse data sources while maintaining strict control over output quality. It is now active for millions of users, significantly improving overall engagement metrics. The deployment highlights how semantic framing influences user perception in online personalized experiences. This initiative addresses the challenge of scaling playlist description generation effectively.

arxiv arXiv cs.LG · 8h ago

VRA-FedSGD: Variance-Reduced Federated Learning for Heavy-Tailed Noise

The authors propose VRA-FedSGD, a variance-reduction based algorithm designed for federated learning in environments with heavy-tailed gradient and communication noise. This approach addresses challenges prevalent in large-scale machine learning over wireless networks and Internet of Things deployments. The method employs momentum variance reduction combined with nonlinear mapping to mitigate heavy-tailed gradient noise. It also utilizes a variance-reduced aggregation mechanism to suppress heavy-tailed communication noise. For nonconvex objective functions, VRA-FedSGD achieves a mean convergence rate of O(K^(-(p-1)/(2p-1))), where p is the tail index. In the almost sure sense, it reaches a rate of Õ(K^(-(1-1/(p-ε))) for strongly convex objectives, with ε being an arbitrarily small constant. Simulated experiments on logistic regression with real-world data verify the algorithm's effectiveness.

media r/LocalLLaMA · 9h ago

GLM-5.2 on 4x DGX Spark: Reconstructing Missing Build Steps for MTP Speculative Decode

The author successfully deployed GLM-5.2 with MTP speculative decode on a cluster of four NVIDIA GB10 (DGX Spark) nodes, achieving approximately 9.4 tokens per second. This setup utilizes vLLM with tensor parallelism, ported sparse-MLA Triton kernels, and a deterministic 15% expert pruning to fit AWQ-INT4 weights. A critical finding is that the original Docker image build instructions are incomplete, requiring reconstruction of missing patches for deep_gemm.py and sparse_attn_indexer.py. The author also identified that using any vLLM version other than the specific pinned commit causes real AWQ weights to crash during loading due to CUDA errors. To replicate the environment, users must apply a custom script that bakes in kernels and routes functions to sm12x fallbacks. Performance benefits include roughly double the speed of previous llama.cpp implementations, though inter-node bandwidth remains a bottleneck for dual-rail scaling.

media r/LocalLLaMA · 9h ago

MINISFORUM DEG1 Oculink eGPU Dock Refurbished Available for $59

A refurbished MINISFORUM DEG1 Oculink eGPU dock is currently available for $59. The product listing highlights its robust build quality, noting that the device has sufficient heft to securely hold a graphics card. Unlike some lower-cost alternatives, this dock includes redrivers to ensure signal integrity. A user who purchased a unit last year reported positive experiences with its performance and stability. The item can be purchased directly from the manufacturer's refurbished product page.

media r/LocalLLaMA · 9h ago

Query on Clustering Nvidia DGX Spark and AMD Ryzen AI Max 395 for Unified Memory Inference

A user inquired about the feasibility of clustering a Nvidia DGX Spark with an AMD Ryzen AI Max 395 to run a single large language model. Both devices possess 128GB of unified memory, offering a potential combined capacity of approximately 256GB minus operating system overhead. The DGX Spark is equipped with a 200Gbit network interface, whereas the AMD Strix system currently has only 5Gbit Ethernet but includes a PCIe Gen 4x4 slot. The user noted that DeepSeek v4 Flash can fit on two DGX Sparks and wondered if the Strix could serve as an alternative node. To improve connectivity, they proposed adding a Mellanox ConnectX-6 QSFP+28 to the AMD system to achieve higher bandwidth over the link.

media r/LocalLLaMA · 9h ago

Colony: An Educational Simulation of LLM Attention Mechanisms Using Agent-Based Analogies

Colony is an educational resource designed to explain the attention mechanism of Large Language Models through simple analogies involving agents. The simulation places these agents within a board environment inspired by Conway's Game of Life. Each agent in the system represents a specific role within the self-attention block mechanism of an LLM. This visual approach allows users to observe how information flows and interacts during the attention process. The project is available as an open-source tool for those interested in exploring these concepts without complex mathematics. It serves as a fun and accessible way to understand the internal workings of transformer models.

media r/LocalLLaMA · 9h ago

User Observes Cloud Chatbots Appear Less Intelligent Than Local Models

A Reddit user reports that cloud chatbots like ChatGPT and Claude often seem less capable than open-source models such as Kimi or GLM when discussing abstract concepts. The author notes that these commercial models frequently leap to conclusions, oversimplify ideas, and rely on repetitive phrasing patterns. This perceived decline in intelligence is attributed to system prompts designed to enforce a specific personality for user engagement. While this behavior was particularly prominent during the GPT-4o era, it reportedly persists in current versions. The user questions whether accessing these models via raw API removes the restrictive system prompts or if they remain embedded. The post seeks community feedback on whether cloud models perform better without these constraints.

media r/LocalLLaMA · 9h ago

Gefen: A Drop-in Replacement for AdamW with Claimed 8x Memory Reduction

Gefen is presented as a drop-in replacement for the AdamW optimizer, claiming an eightfold reduction in memory usage during training. The project includes a GitHub repository available at ndvbd/Gefen and a corresponding research paper hosted on arXiv under the identifier 2606.13894. This submission highlights Gefen's potential to optimize resource efficiency for machine learning workflows. The provided source material links directly to the technical documentation and codebase for further verification. No additional performance metrics or comparative benchmarks are detailed in the available text.

media Hugging Face Forums · 9h ago

User Reports HuggingFace Charging for Unused L40S Compute in Spaces

A user on the Hugging Face discussion forum reported an issue where their Space remained stuck at the starting phase while using an L40S GPU. The user expressed frustration that they were being charged for compute resources despite the application failing to launch or utilize any actual processing power. This incident highlights concerns regarding billing transparency and infrastructure reliability within the platform's Spaces environment. The post serves as a complaint about financial loss due to technical failures rather than a feature announcement. No further technical details or official responses were included in the truncated source content.