All articles — korshunov.ai

All articles Page 1 / 112

HoLo-ToLk: Tokenizer-Free Speech Models on 0-Parameter HSL Substrate

The author introduces HoLo-ToLk, a research project building speech-to-text (STT) and text-to-speech (TTS) models using the zero-parameter HSL byte substrate without tokenizers or learned input embeddings. The work demonstrates that raw HSL bytes can serve as a viable signal for audio processing when combined with specific architectural modifications.

github llama.cpp · 5h ago

llama.cpp b9837 release adds --reasoning-preserve flag and new binaries

The llama.cpp project has released version b9837, which introduces a new `--reasoning-preserve` flag for the Jinja chat template to retain reasoning tokens. This update also includes corrected help messages and provides pre-built binaries for macOS, Linux, Windows, Android, and openEuler across various hardware backends.

lab OpenAI News · 5h ago

HP Inc. launches Frontier strategic partnership with OpenAI

HP Inc. is scaling its strategic partnership with OpenAI following successful pilots, deploying AI across customer experiences, employee productivity, and software development. The company utilizes the OpenAI Frontier platform as a unified operating model to govern context, permissions, and evaluation as it moves from experimental use cases to enterprise-wide production.

arxiv arXiv cs.LG · 6h ago

Solve for the Hyperparameter, Skip the Search: Kolmogorov-Optimal Scaling Laws for Spline Regression

The article introduces KORE, a method that determines optimal spline regression resolution in closed form rather than through exhaustive hyperparameter search. By leveraging classical approximation theory and the PRESS identity, it analytically balances bias and noise scales to achieve accuracy comparable to grid sweeps with significantly less compute.

arxiv arXiv cs.LG · 6h ago

Polynomial Kolmogorov-Arnold Networks Learn Game of Life Dynamics

This study demonstrates that neural networks can reliably learn Conway's Game of Life dynamics using minimal architectures by employing specific inductive biases rather than relying on large-scale search processes. The authors show that network variants with alternative activation functions significantly outperform standard Rectified Linear Units, particularly through the use of second-degree polynomial activations.

arxiv arXiv cs.LG · 6h ago

Quantifying Agreement Between Data-Influence and Data-Similarity in LLMs

This study quantifies the agreement between data-similarity and data-influence measures used for tracing LLM outputs back to training data, revealing a significant overlap with an asymmetry where data-influence ranks top similar documents more consistently. Experiments across models including OLMo2-1B, Qwen3-1.7B, LlaMa3.2-1B, Gemma3-1B, and GPT2 demonstrate that this asymmetry allows for a favorable cost-accuracy trade-off by using data-influence to refine cheaper data-similarity results.

arxiv arXiv cs.LG · 6h ago

Neural Networks as Linear Regression: An Introduction for Statisticians

This article introduces neural networks to statisticians by demystifying the field through the lens of linear regression approximation.

arxiv arXiv cs.LG · 6h ago

Scaling Linear Mode Connectivity and Merging to Billion Parameter Pretrained Transformers

Researchers propose a scalable framework for merging independently trained billion-parameter transformers using linear mode connectivity, addressing scalability limits in existing methods. The approach employs function-preserving weight transformations and a dual learning procedure where both models jointly optimize toward a shared linear interpolation path.

arxiv arXiv cs.LG · 6h ago

Causal Discovery in the Era of Agents

The article argues against using large language models to infer causal structures, warning that such approaches risk confusing textual associations with genuine causal evidence. Instead, it proposes that agents should only assist the workflow by inspecting data and explaining assumptions, while leaving causal claims grounded in formal algorithms and diagnostics.

media r/LocalLLaMA · 6h ago

User runs Qwen3.6-27B on low-end hardware for construction POCs

A Reddit user demonstrates running the Qwen3.6-27B model quantized to Q3 with KV at Q8 on an AMD Mi50 32GB GPU, achieving approximately 180+ tokens per second for prompt processing and 9 tokens per second for text generation.

media r/LocalLLaMA · 6h ago

NPC Engine Using Local Models

A developer has created a game-agnostic NPC engine backend that leverages small local models to achieve fast response times and decent quality for role-playing games. The system utilizes NVIDIA Parakeet 0.6 for speech-to-text, Gemma 4 26B A4B as the LLM, and Qwen3-TTS for voice synthesis.

media r/LocalLLaMA · 6h ago

Tensor split performance on low-bandwidth (TB3) eGPUs, and a question

A user reports testing tensor split mode with two Morefine G1 4090M 16GB eGPUs connected via Thunderbolt 3 at 40Gbps. While layer split mode yields high token rates for prefill (PP) and text generation (TG), tensor split mode saturates both cards during TG but suffers from poor PP performance due to bandwidth saturation.

arxiv arXiv cs.LG · 7h ago

Discovering Latent Groups for Robust Classification

The authors propose neural classification trees (NCT), a framework that achieves robustness by encoding subgroup structure within its tree-shaped architecture to address spurious correlations in machine learning models.

arxiv arXiv cs.LG · 7h ago

Data Selection Through Iterative Self-Filtering for Vision-Language Settings

Researchers propose a novel bootstrapped method called Self-Filtering that trains a CLIP model on an evolving dataset selected through iterative self-filtering. This approach balances filtered, high-probability clean samples with diverse examples from the entire distribution to mitigate noise in large-scale vision-language datasets.

arxiv arXiv cs.LG · 7h ago

Hedgementation: A Remote Sensing Benchmark for Hedgerow Segmentation

The authors propose Hedgementation, a new benchmark designed to evaluate machine learning models for mapping hedgerows from remote sensing data at a country scale with 10m² spatial resolution. This initiative combines and harmonizes multiple remote sensing products and ground truth labels derived from a French hedgerow inventory.

arxiv arXiv cs.LG · 7h ago

RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

This paper proposes an active, continual learning paradigm for Vision-Language-Action (VLA) models to address the inefficiencies of passive imitation learning. The authors demonstrate that uncertainty-guided data collection improves fine-tuning efficiency but causes catastrophic forgetting when recovery data is used exclusively.

arxiv arXiv cs.LG · 7h ago

DiT-Reward: Generative Representations for Text-to-Image Reward Modeling

The article introduces DiT-Reward, a method that converts a pretrained text-to-image Diffusion Transformer into a reward model by processing near-clean image latents and aggregating text-conditioned representations across transformer layers. This approach leverages generative representations to evaluate the quality of generated images without requiring separate training objectives.

arxiv arXiv cs.LG · 7h ago

Muown Implicitly Performs Angular Step-size Decay

The article demonstrates that Muown's directional update is equivalent to a Riemannian step on normalized directions, where the un-normalized parameterization magnitude modulates the angular step size. This insight explains Muown's step-size stability and motivates the development of AngularMuown, which optimizes directly over normalized directions with an explicit, schedulable angular multiplier.

arxiv arXiv cs.LG · 7h ago

Learning Process Rewards via Success Visitation Matching for Efficient RL

The authors propose a method to transform inherently sparse outcome rewards in reinforcement learning into dense process rewards by training a discriminator to distinguish between successful and unsuccessful episodes. This approach incentivizes the policy to match the state-action visitations of successful episodes while avoiding those of unsuccessful ones, providing dense feedback on progress without altering the optimal policy.

blog Simon Willison · 7h ago

Hack Your Summer Launches Free Production Sprint for Students

Hack Your Summer is a free, four-week high-velocity production sprint designed for undergraduate students, graduate students, and recent graduates to build tangible, public-facing work. The initiative serves as an alternative to traditional internships amid a crisis of reduced internship availability in the US.