All articles — korshunov.ai

All articles Page 1 / 112

The Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

This article introduces a framework for modeling the energy consumption of Transformer training on multiple GPUs, aiming to address growing computational costs in sustainable system design.

arxiv arXiv cs.LG · 4h ago

SuperCond-GNN: Scalable Graph Neural Network Surrogate for Superconducting Circuit Simulations

This paper introduces SuperCond-GNN, a graph neural network surrogate model designed to predict voltage distribution in high-temperature superconducting magnets by mapping lumped-element circuits to graph representations. The model achieves a mean MAPE of 4.3% on tape stacks and enables fast inference of current redistribution across various circuit configurations.

arxiv arXiv cs.LG · 4h ago

Approximating velocity fields with planted attractors via Neural-ODEs for classification

This work employs Neural ODEs equipped with a curated collection of equilibrium points to perform classification tasks. The planted attractors serve as indicators for target classes, while the velocity field shapes the dynamical landscape to direct inputs toward their corresponding destinations.

arxiv arXiv cs.LG · 4h ago

Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models

Researchers propose Self-Aware Scheduling (SAS), a method that learns an optimal token unmasking order for masked diffusion language models to improve generation quality. By deriving a tractable upper bound on sequential decoding mismatch, the approach casts order selection as a policy optimization problem using Group Relative Policy Optimization.

media r/LocalLLaMA · 4h ago

Minimax M3 vs M2.7

A Reddit user is requesting feedback from individuals who have updated to the Minimax M3 model from version M2.7. The post seeks community insights on the differences and performance between these two iterations.

media r/LocalLLaMA · 4h ago

High-quality GLM-5.2 Quant on 4x DGX Spark - Guide, Results, and Comps

The author demonstrates running the GLM-5.2 NVFP4 model on four NVIDIA GB10 DGX Spark nodes with a 128K context window, achieving usable serving performance through aggressive system optimization.

media r/LocalLLaMA · 4h ago

MLX Fine-Tune Example Guide

A user demonstrates fine-tuning a 7B instruction model on Apple Silicon using MLX to shift its style to high-fantasy literature. The experiment shows that a small, curated dataset can significantly alter a model's register and diction with minimal computational resources.

arxiv arXiv cs.LG · 5h ago

SVD-Surgeon: Optimal Singular-Value Surgery for Large Language Model Compression

Researchers have introduced SVD-Surgeon, a training-free method that applies the Optimal Brain Surgeon framework to singular-value decomposition for compressing large language models. This approach computes closed-form updates for retained singular values to compensate for truncation errors and determines which values to prune based on saliency.

arxiv arXiv cs.LG · 5h ago

Patient-Aware Contrastive Learning Preserves Per-Patient Structure in RR-Interval Representations

The article addresses the challenge of contrastive representation learning on physiological signals where subject-specific baselines interfere with class-level objectives, causing models to lose individual variation necessary for generalization. The authors propose a patient-aware contrastive objective for Paroxysmal Atrial Fibrillation detection that forms positive pairs only from same-patient segments to preserve sinus rhythm baselines while separating classes.

arxiv arXiv cs.LG · 5h ago

A Spectral Theory of Normalized Corrected GNN Propagation

This paper develops a spectral theory for normalized corrected Graph Neural Network (GNN) propagation, focusing on the symmetric normalized adjacency matrix with its degree-stationary component removed to isolate the direction tied to oversmoothing.

arxiv arXiv cs.LG · 5h ago

MORL-A2C: Multi-Objective Reinforcement Learning Reranker for Health

Researchers introduce MORL-A2C, a sequential decision-making extension to the MOPI-HFRS system that uses an Advantage Actor-Critic algorithm to optimize the trade-off between user preference and nutritional health in food recommendations.

media r/LocalLLaMA · 5h ago

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

The author developed a specialized agent harness designed to address the specific failure modes of small local models, such as failed tool calls and poor state tracking. This custom framework enables smaller models like Qwen 3.5 4b to effectively manage remote servers.

media r/LocalLLaMA · 5h ago

Locally running mode turns an Image into a Cute Controllable Character you can Play as

The author presents the 800M version of a model that converts images into controllable characters, designed to run comfortably on consumer GPUs. This iteration increases context to 12 latent frames and improves stability while maintaining high performance, achieving over 60 fps on an RTX 5090.

media Hugging Face Forums · 5h ago

HoLo-ToLk: Tokenizer-Free Speech Models on 0-Parameter HSL Substrate

The author introduces HoLo-ToLk, a research project building speech-to-text (STT) and text-to-speech (TTS) models using the zero-parameter HSL byte substrate without tokenizers or learned input embeddings. The work demonstrates that raw HSL bytes can serve as a viable signal for audio processing when combined with specific architectural modifications.

github llama.cpp · 5h ago

llama.cpp b9837 release adds --reasoning-preserve flag and new binaries

The llama.cpp project has released version b9837, which introduces a new `--reasoning-preserve` flag for the Jinja chat template to retain reasoning tokens. This update also includes corrected help messages and provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

lab OpenAI News · 5h ago

HP Inc. launches Frontier strategic partnership with OpenAI

HP Inc. is scaling its strategic partnership with OpenAI following successful pilots, deploying AI across customer experiences, employee productivity, and software development. The company utilizes the OpenAI Frontier platform as a unified operating model to govern context, permissions, and evaluation as it moves from experimental use cases to enterprise-wide production.

arxiv arXiv cs.LG · 6h ago

Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression

The article introduces KORE, a method that determines optimal spline regression resolution in closed form rather than through exhaustive hyperparameter search. By leveraging classical approximation theory and the PRESS identity, it analytically balances bias and noise scales to achieve accuracy comparable to grid sweeps with significantly less compute.

arxiv arXiv cs.LG · 6h ago

Polynomial Kolmogorov-Arnold Networks Learn Game of Life Dynamics

This study demonstrates that neural networks can reliably learn Conway's Game of Life dynamics using minimal architectures by employing specific inductive biases rather than relying on large-scale search processes. The authors show that network variants with alternative activation functions significantly outperform standard Rectified Linear Units, particularly through the use of second-degree polynomial activations.

arxiv arXiv cs.LG · 6h ago

Quantifying Agreement Between Data-Influence and Data-Similarity in LLMs

This study quantifies the agreement between data-similarity and data-influence measures used for tracing LLM outputs back to training data, revealing a significant overlap with an asymmetry where data-influence ranks top similar documents more consistently. Experiments across models including OLMo2-1B, Qwen3-1.7B, LlaMa3.2-1B, Gemma3-1B, and GPT2 demonstrate that this asymmetry allows for a favorable cost-accuracy trade-off by using data-influence to refine cheaper data-similarity results.

arxiv arXiv cs.LG · 6h ago

Neural Networks as Linear Regression: An Introduction for Statisticians

This article introduces neural networks to statisticians by demystifying the field through the lens of linear regression approximation.