All articles — korshunov.ai

All articles Page 1 / 112

Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models

Researchers propose Self-Aware Scheduling (SAS), a method that learns an optimal token unmasking order for masked diffusion language models to improve generation quality. By deriving a tractable upper bound on sequential decoding mismatch, the approach casts order selection as a policy optimization problem using Group Relative Policy Optimization.

media r/LocalLLaMA · 5h ago

Minimax M3 vs M2.7

A Reddit user is requesting feedback from individuals who have updated to the Minimax M3 model from version M2.7. The post seeks community insights on the differences and performance between these two iterations.

media r/LocalLLaMA · 5h ago

High-quality GLM-5.2 Quant on 4x DGX Spark - Guide, Results, and Comps

The author demonstrates running the GLM-5.2 NVFP4 model on four NVIDIA GB10 DGX Spark nodes with a 128K context window, achieving usable serving performance through aggressive system optimization.

media r/LocalLLaMA · 5h ago

MLX Fine-Tune Example Guide

A user demonstrates fine-tuning a 7B instruction model on Apple Silicon using MLX to shift its style to high-fantasy literature. The experiment shows that a small, curated dataset can significantly alter a model's register and diction with minimal computational resources.

arxiv arXiv cs.LG · 6h ago

SVD-Surgeon: Optimal Singular-Value Surgery for Large Language Model Compression

Researchers have introduced SVD-Surgeon, a training-free method that applies the Optimal Brain Surgeon framework to singular-value decomposition for compressing large language models. This approach computes closed-form updates for retained singular values to compensate for truncation errors and determines which values to prune based on saliency.

arxiv arXiv cs.LG · 6h ago

Patient-Aware Contrastive Learning Preserves Per-Patient Structure in RR-Interval Representations

The article addresses the challenge of contrastive representation learning on physiological signals where subject-specific baselines interfere with class-level objectives, causing models to lose individual variation necessary for generalization. The authors propose a patient-aware contrastive objective for Paroxysmal Atrial Fibrillation detection that forms positive pairs only from same-patient segments to preserve sinus rhythm baselines while separating classes.

arxiv arXiv cs.LG · 6h ago

A Spectral Theory of Normalized Corrected GNN Propagation

This paper develops a spectral theory for normalized corrected Graph Neural Network (GNN) propagation, focusing on the symmetric normalized adjacency matrix with its degree-stationary component removed to isolate the direction tied to oversmoothing.

arxiv arXiv cs.LG · 6h ago

MORL-A2C: Multi-Objective Reinforcement Learning Reranker for Health

Researchers introduce MORL-A2C, a sequential decision-making extension to the MOPI-HFRS system that uses an Advantage Actor-Critic algorithm to optimize the trade-off between user preference and nutritional health in food recommendations.

media r/LocalLLaMA · 6h ago

I built an agent Harness for Small Models. I got Qwen 3.5 4b managing servers.

The author developed a specialized agent harness designed to address the specific failure modes of small local models, such as failed tool calls and poor state tracking. This custom framework enables smaller models like Qwen 3.5 4b to effectively manage remote servers.

media r/LocalLLaMA · 6h ago

Locally running mode turns an Image into a Cute Controllable Character you can Play as

The author presents the 800M version of a model that converts images into controllable characters, designed to run comfortably on consumer GPUs. This iteration increases context to 12 latent frames and improves stability while maintaining high performance, achieving over 60 fps on an RTX 5090.

media Hugging Face Forums · 6h ago

HoLo-ToLk: Tokenizer-Free Speech Models on 0-Parameter HSL Substrate

The author introduces HoLo-ToLk, a research project building speech-to-text (STT) and text-to-speech (TTS) models using the zero-parameter HSL byte substrate without tokenizers or learned input embeddings. The work demonstrates that raw HSL bytes can serve as a viable signal for audio processing when combined with specific architectural modifications.

github llama.cpp · 6h ago

llama.cpp b9837 release adds --reasoning-preserve flag and new binaries

The llama.cpp project has released version b9837, which introduces a new `--reasoning-preserve` flag for the Jinja chat template to retain reasoning tokens. This update also includes corrected help messages and provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

lab OpenAI News · 6h ago

HP Inc. launches Frontier strategic partnership with OpenAI

HP Inc. is scaling its strategic partnership with OpenAI following successful pilots, deploying AI across customer experiences, employee productivity, and software development. The company utilizes the OpenAI Frontier platform as a unified operating model to govern context, permissions, and evaluation as it moves from experimental use cases to enterprise-wide production.

arxiv arXiv cs.LG · 7h ago

Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression

The article introduces KORE, a method that determines optimal spline regression resolution in closed form rather than through exhaustive hyperparameter search. By leveraging classical approximation theory and the PRESS identity, it analytically balances bias and noise scales to achieve accuracy comparable to grid sweeps with significantly less compute.

arxiv arXiv cs.LG · 7h ago

Polynomial Kolmogorov-Arnold Networks Learn Game of Life Dynamics

This study demonstrates that neural networks can reliably learn Conway's Game of Life dynamics using minimal architectures by employing specific inductive biases rather than relying on large-scale search processes. The authors show that network variants with alternative activation functions significantly outperform standard Rectified Linear Units, particularly through the use of second-degree polynomial activations.

arxiv arXiv cs.LG · 7h ago

Quantifying Agreement Between Data-Influence and Data-Similarity in LLMs

This study quantifies the agreement between data-similarity and data-influence measures used for tracing LLM outputs back to training data, revealing a significant overlap with an asymmetry where data-influence ranks top similar documents more consistently. Experiments across models including OLMo2-1B, Qwen3-1.7B, LlaMa3.2-1B, Gemma3-1B, and GPT2 demonstrate that this asymmetry allows for a favorable cost-accuracy trade-off by using data-influence to refine cheaper data-similarity results.

arxiv arXiv cs.LG · 7h ago

Neural Networks as Linear Regression: An Introduction for Statisticians

This article introduces neural networks to statisticians by demystifying the field through the lens of linear regression approximation.

arxiv arXiv cs.LG · 7h ago

Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers

Researchers propose a scalable framework for merging independently trained billion-parameter transformers using linear mode connectivity, addressing scalability limits in existing methods. The approach employs function-preserving weight transformations and a dual learning procedure where both models jointly optimize toward a shared linear interpolation path.

arxiv arXiv cs.LG · 7h ago

Causal Discovery in the Era of Agents

The article argues against using large language models to infer causal structures, warning that such approaches risk confusing textual associations with genuine causal evidence. Instead, it proposes that agents should only assist the workflow by inspecting data and explaining assumptions, while leaving causal claims grounded in formal algorithms and diagnostics.

media r/LocalLLaMA · 7h ago

User runs Qwen3.6-27B on low-end hardware for construction POCs

A Reddit user demonstrates running the Qwen3.6-27B model quantized to Q3 with KV at Q8 on an AMD Mi50 32GB GPU, achieving approximately 180+ tokens per second for prompt processing and 9 tokens per second for text generation.