All articles — korshunov.ai

All articles Page 1 / 90

Reference-Free Assessment of Physical Consistency in World Model-based Video Generation

The authors introduce reference-free measures for evaluating the physical consistency of generated videos by combining relative and absolute fidelity assessments. This approach addresses the gap in physical fidelity that often prevents video generation tools like WorldGym or WorldEval from accurately reproducing real-world task success rates for VLA models. Unlike existing methods requiring costly human voting or unavailable ground-truth references, the new framework utilizes DROID-SLAM and SEA-RAFT to quantify inconsistencies. Motivated by WorldScore, the relative consistency assessment filters videos to improve task success rates by over 8%. Additionally, the absolute assessment enables spatio-temporal localization to visualize when and where physical artifacts occur in the generated content.

arxiv arXiv cs.LG · 7h ago

Kiwano: An Open-Source PyTorch Toolkit for Speaker Verification Research

Researchers have introduced Kiwano, an open-source toolkit designed to advance research and evaluation in the field of speaker verification. Built on PyTorch, this lightweight yet extensible framework provides standardized recipes, pretrained models, and integration of widely used architectures. The project emphasizes reproducibility by delivering transparent training pipelines, unified evaluation protocols, and ready-to-use baselines across multiple corpora. Beyond standard training and inference capabilities, Kiwano includes specialized tools for benchmarking, experiment tracking, and the rapid prototyping of new architectures. To encourage community adoption, the toolkit is distributed under the Apache 2.0 license and is accompanied by comprehensive documentation and reproducible experiments. By lowering entry barriers and standardizing evaluation practices, Kiwano aims to serve as a valuable resource for both academic research and applied development. The project is publicly available on GitHub at https://github.com/kiwano-toolkit/kiwano/.

arxiv arXiv cs.LG · 7h ago

Multigrid Training for Molecular Generation using Graph Neural Networks

The authors introduce a multigrid training strategy to address the high computational costs and instability associated with modeling biochemical molecular systems at full resolution. This approach leverages low-resolution optimization to accelerate learning at higher resolutions by transferring parameters across different discretizations. For graph-based molecular representations, the method progressively transfers parameters from a coarse graph to increasingly finer graphs using biased random walk upsampling. In 3D molecular generation, structures are voxelized at multiple resolutions, allowing a coarse-resolution conditional Variational Autoencoder (CVAE) to be pretrained first. Shape-compatible convolutional parameters are then transferred from the coarse model to initialize a fine-resolution CVAE. Numerical experiments on receptor-conditioned 3D ligand generation demonstrate that this method accelerates convergence compared to training from scratch. Additionally, the study shows that multigrid training improves generalization capabilities for molecular generation tasks.

media r/LocalLLaMA · 7h ago

Community Inquiry on Running DwarfStar with DeepSeek V4 Flash on DGX Spark

A Reddit user in the r/LocalLLaMA community is asking for experiences regarding the use of DwarfStar (DS4) with the DeepSeek V4 Flash model on a single NVIDIA DGX Spark device. The inquiry highlights technical specifications suggesting that DS4's Mixture of Experts approach and unified memory strategy allow for loading the model with 80 billion active parameters and full maximum context length. The poster references external resources, including a GitHub repository by antirez and a demonstration video, to support these claims about performance capabilities. The discussion seeks feedback on the practical viability of this setup, specifically questioning the quality of agentic coding tasks performed under these constraints. This request reflects ongoing interest in optimizing large language model inference on consumer-grade or compact hardware configurations.

media r/LocalLLaMA · 7h ago

Gemma4-26B-A4B & 31B-QAT Uncensored Balanced Released with MTP Speed Boosts

HauhauCS has released two new uncensored, balanced versions of the Gemma 4 models: Gemma4-26B-A4B and Gemma4-31B-QAT. Both variants incorporate Multi-Token Prediction (MTP) draft heads to enable speculative decoding, resulting in significant inference speed improvements. The 26B-A4B model achieves approximately a 35% speed boost, while the 31B model sees a 53% increase, with identical output quality verified by the model's drafting mechanism. These releases utilize QAT-aware quantization, making Q4_K_M the optimal format as higher precision offers no quality gains for these specific models. The 26B-A4B is a Mixture of Experts architecture with roughly 4 billion active parameters per token, whereas the 31B variant is a dense model offering higher capability for users with sufficient VRAM. Both models include vision support via mmproj files and maintain a 262K context window. The author notes that GenRM testing resulted in zero refusals across 465 prompts, confirming their uncensored nature.

arxiv arXiv cs.LG · 7h ago

HyperAdapter: Structured Hyperedge Adaptation for Parameter-Efficient Fine-Tuning of Vision Transformers

The authors propose HyperAdapter, a novel parameter-efficient fine-tuning method that adapts vision transformers in hyperedge space rather than token space. Existing adapter-based methods typically perform independent adaptations for each token, which overlooks structured relationships and can lead to redundant updates. HyperAdapter constructs a soft hypergraph over ViT tokens using prototype-based assignments to enable group-aware adaptation. The architecture aggregates token features into latent hyperedge representations and applies lightweight bottleneck adaptation at the hyperedge level. Updates are then diffused back to individual tokens via the hypergraph incidence structure, injecting an explicit structural inductive bias. Extensive experiments across diverse visual benchmarks demonstrate that this approach consistently outperforms strong PEFT baselines under comparable parameter budgets. The results highlight significant gains on tasks requiring structured reasoning and suggest that the choice of adaptation space is a critical dimension for efficient transfer.

arxiv arXiv cs.LG · 7h ago

Shift-Invariant Variance Estimator Eliminates Minimization Bias in Local Learning Coefficient Estimation

Singular Learning Theory uses the Local Learning Coefficient to quantify neural network loss landscape geometry, but mean-energy estimators rely on an additive loss baseline. During off-equilibrium training phases, this minimum is unknown, and substituting it with noisy mini-batch losses introduces systematic minimization bias. The authors propose the Shift-Invariant Variance Estimator (SIVE) to structurally eliminate this unknown baseline through the variance operator. By combining SIVE with a correction derived from the Law of Total Variance, the method separates geometric loss fluctuations from evaluation noise. Controlled experiments on analytically tractable toy models demonstrate that SIVE recovers expected finite-temperature geometric signals where anchored mean estimators fail. Applied to deep neural networks, SIVE serves as a robust diagnostic for tracking structural phase transitions throughout training.

arxiv arXiv cs.LG · 7h ago

Efficient CNN with Transfer Learning for Multi-Cancer Detection

A study introduces a lightweight convolutional neural network enhanced with transfer learning for multi-cancer detection using biomedical images. The architecture aims to reduce computational complexity while maintaining high classification performance for deployment in resource-constrained environments. Researchers evaluated the model on three tumor datasets comprising brain MRI and lung and kidney CT scans. The system achieved test accuracies of 90.85%, 98.64%, and 99.92% for brain, lung, and kidney cancer respectively via five-fold stratified cross-validation. Transfer learning was employed by pretraining on one cancer type and fine-tuning on others, requiring only 20 additional epochs to match scratch-trained models. The fine-tuning process updates the classification part of the CNN and takes approximately 0.014 seconds per image per epoch on an NVIDIA GeForce GTX 960. Comparative evaluations demonstrate that this model outperforms state-of-the-art architectures such as Xception, VGG16, VGG19, MobileNetV2, and DenseNet121.

blog Simon Willison · 7h ago

Simon Willison converts MDN browser compatibility data into a SQLite database

Inspired by Mozilla's new MDN MCP service, Simon Willison has converted the comprehensive mdn/browser-compat-data repository into a SQLite database. The project utilizes a script generated by Claude Code for web (Opus 4.8) to perform this conversion using sqlite-utils. The resulting database is approximately 66MB in size and is hosted on GitHub with open CORS headers to facilitate direct access. To automate the process, a GitHub Actions workflow was built using Codex Desktop (GPT-5.5) to force-push the updated database to an orphan branch named db. Users can download the final browser-compat.db file directly from the repository or explore its contents via Datasette Lite.

arxiv arXiv cs.LG · 8h ago

P4IR: Reinforcement Learning Enhances Automated Code Compliance Systems

A new framework named P4IR addresses the issue of hallucinated rules in large language model-based automated code compliance systems. This two-stage approach first employs supervised fine-tuning to instill domain knowledge into the model. It then utilizes Group Relative Policy Optimization to improve the accuracy of generated high-level code skeletons. The method achieved reductions of up to 23.8% in tree edit distance and 38.6% in token-level Levenshtein distance compared to supervised fine-tuning baselines. Comparative analysis shows that P4IR outperforms leading models like Claude Opus, GPT-5.2, and Qwen-3-Max in zero-shot settings. Additionally, the reinforcement learning stage produced a statistically significant reduction in false positives. This combination of techniques offers a path toward more reliable automated code compliance.

arxiv arXiv cs.LG · 8h ago

Asymptotic Signal Subspace Recovery in Softmax Attention Models

This study investigates the theoretical principles behind softmax-attention mechanisms by analyzing a stylized model where a query vector is learned via stochastic gradient ascent. The authors exploit the model's symmetry to derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. By employing tools from stochastic approximation and dynamical systems theory, they establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Under suitable high-dimensional scaling assumptions and standard step-size conditions, the research demonstrates that the learned query converges almost surely to the one-dimensional signal subspace. This convergence implies that the query asymptotically recovers the latent informative direction up to an intrinsic sign ambiguity. The findings provide a theoretical foundation for understanding attention as a signal extraction procedure in high-dimensional noisy environments.

arxiv arXiv cs.LG · 8h ago

QeHDC: Hyperdimensional Computing based on Quantum-enhanced binding and SuperClass Construction

The authors propose QeHDC, a novel framework extending classical Hyperdimensional Computing by leveraging quantum mechanical properties for enhanced computational efficiency. This approach utilizes a one-pass training method that employs sinusoidal and quantum encoding to project classical data into quantum amplitude states. A key innovation is the introduction of a reference-state-based quantum binding operation realized through specific quantum circuits. Additionally, the framework implements a density-matrix-based superclass generation strategy using eigenvalue decomposition to extract critical quantum state features. These mechanisms enable more accurate and robust class representations for classification tasks. Experimental evaluations on standard benchmark datasets demonstrate superior performance compared to traditional classical and existing quantum-enhanced methods. The results also highlight the approach's robustness to noise and computational feasibility, suggesting practical benefits for future quantum-inspired paradigms.

arxiv arXiv cs.LG · 8h ago

GaRA: Graph-aware LoRA Generation for Enhancing LLMs on Graph Tasks

Graph neural networks often exhibit limited transferability due to their tight coupling with dataset-specific feature spaces, whereas language models offer flexible generalization through a unified interface. Existing methods for adapting language models to graph tasks struggle to encode whole-graph information, which can lead to significant information loss and suboptimal understanding. To address this limitation, the authors propose GaRA, a novel Graph-aware LoRA generation model that implements a weight-level information injection paradigm. This approach generates task-specific weight updates conditioned on original graph structures, allowing them to interact directly with hidden representations. The method constrains the norm of these generated updates to inject whole-graph information while avoiding optimization bias inherent in standard weight generation. Empirical studies demonstrate that GaRA consistently outperforms baseline methods across various zero-shot graph learning tasks.

arxiv arXiv cs.LG · 8h ago

LLMs Determine Causal Structure via Difference-Making Logic

The article addresses the puzzle of how large language models acquire causal structure despite the limitations of standard formalisms like Judea Pearl's interventionist approach and the Neyman-Rubin framework. It argues that LLMs utilize a specific inductive method known as variational induction, which relies on difference-making logic. During training, models process vast amounts of text from diverse contexts to identify what constitutes a difference-maker or an indifference-maker within word sequences. The analysis examines how architectural components, specifically token embeddings and self-attention mechanisms, facilitate this variational induction process. This logical framework fundamentally parallels the experimental method used in science. In both cases, causal relations are derived by systematically varying individual circumstances to observe their influence on a phenomenon.

arxiv arXiv cs.LG · 8h ago

Escaping the Variance Trap: Jacobian-Free Dynamics for Root-Finding Bilevel Optimization

The authors identify a critical flaw termed the Variance Trap, which arises when stochastic root-finding problems are forced into minimization frameworks via squared residuals. Standard bilevel minimization algorithms require estimating hypergradients involving implicit Jacobians that act as noise amplifiers in stochastic settings. To address this, the paper formalizes Root-Finding Bilevel Optimization (RF-BO) as a distinct problem class to bypass this pathology. A Jacobian-free solution using Two-Time-Scale Stochastic Approximation (TTSA) is proposed to update directly along the root error. The study provides the first non-asymptotic convergence guarantees for TTSA in this setting under Markovian noise. Experiments show a 2.6% top-1 accuracy gain in SimCLR and 17x faster convergence in non-linear ODE control compared to baselines. Additionally, the framework achieves significantly improved entropy stability in reinforcement learning and an 11.1% quality improvement in generative modeling.

arxiv arXiv cs.LG · 8h ago

RQ-TTSA: Distribution-Aware Robust Bilevel Optimization with Quantile-Guided Huber Updates

The authors propose RQ-TTSA, a distribution-aware framework designed to address instability in bilevel optimization caused by heavy-tailed stochastic noise. Unlike existing variance-reduction techniques that rely on myopic magnitude checks, this method uses historical gradient buffers to estimate rolling quantiles for adaptive Huber-style clipping. This approach preserves local optimization geometry while strictly bounding effective variance under nonconvex-strongly convex assumptions with infinite-variance noise. Theoretical analysis derives a convergence rate of O(T^(-(p-1)/(3p-2))) that recovers optimal dependence on the heavy-tailed parameter p. Empirical evaluations across six diverse tasks, including vision benchmarks and offline reinforcement learning, show consistent outperformance over state-of-the-art baselines. RQ-TTSA eliminates divergence spikes and ensures stable convergence with negligible computational overhead of approximately 2.7 percent.

arxiv arXiv cs.LG · 8h ago

Deezer Deploys LLM-Based Music Playlist Captioning System

Deezer has deployed an automatic playlist captioning system powered by large language models to enhance its Daily Mix feature. This technology generates natural language descriptions for personalized playlists, helping users understand the content behind each recommendation. The system leverages recent advances in LLMs to process diverse data sources while maintaining strict control over output quality. It is now active for millions of users, significantly improving overall engagement metrics. The deployment highlights how semantic framing influences user perception in online personalized experiences. This initiative addresses the challenge of scaling playlist description generation effectively.

arxiv arXiv cs.LG · 8h ago

VRA-FedSGD: Variance-Reduced Federated Learning for Heavy-Tailed Noise

The authors propose VRA-FedSGD, a variance-reduction based algorithm designed for federated learning in environments with heavy-tailed gradient and communication noise. This approach addresses challenges prevalent in large-scale machine learning over wireless networks and Internet of Things deployments. The method employs momentum variance reduction combined with nonlinear mapping to mitigate heavy-tailed gradient noise. It also utilizes a variance-reduced aggregation mechanism to suppress heavy-tailed communication noise. For nonconvex objective functions, VRA-FedSGD achieves a mean convergence rate of O(K^(-(p-1)/(2p-1))), where p is the tail index. In the almost sure sense, it reaches a rate of Õ(K^(-(1-1/(p-ε))) for strongly convex objectives, with ε being an arbitrarily small constant. Simulated experiments on logistic regression with real-world data verify the algorithm's effectiveness.

media r/LocalLLaMA · 9h ago

GLM-5.2 on 4x DGX Spark: Reconstructing Missing Build Steps for MTP Speculative Decode

The author successfully deployed GLM-5.2 with MTP speculative decode on a cluster of four NVIDIA GB10 (DGX Spark) nodes, achieving approximately 9.4 tokens per second. This setup utilizes vLLM with tensor parallelism, ported sparse-MLA Triton kernels, and a deterministic 15% expert pruning to fit AWQ-INT4 weights. A critical finding is that the original Docker image build instructions are incomplete, requiring reconstruction of missing patches for deep_gemm.py and sparse_attn_indexer.py. The author also identified that using any vLLM version other than the specific pinned commit causes real AWQ weights to crash during loading due to CUDA errors. To replicate the environment, users must apply a custom script that bakes in kernels and routes functions to sm12x fallbacks. Performance benefits include roughly double the speed of previous llama.cpp implementations, though inter-node bandwidth remains a bottleneck for dual-rail scaling.

media r/LocalLLaMA · 9h ago

MINISFORUM DEG1 Oculink eGPU Dock Refurbished Available for $59

A refurbished MINISFORUM DEG1 Oculink eGPU dock is currently available for $59. The product listing highlights its robust build quality, noting that the device has sufficient heft to securely hold a graphics card. Unlike some lower-cost alternatives, this dock includes redrivers to ensure signal integrity. A user who purchased a unit last year reported positive experiences with its performance and stability. The item can be purchased directly from the manufacturer's refurbished product page.