All articles — korshunov.ai

All articles Page 1 / 111

Learning Process Rewards via Success Visitation Matching for Efficient RL

The authors propose a method to transform inherently sparse outcome rewards in reinforcement learning into dense process rewards by training a discriminator to distinguish between successful and unsuccessful episodes. This approach incentivizes the policy to match the state-action visitations of successful episodes while avoiding those of unsuccessful ones, providing dense feedback on progress without altering the optimal policy.

blog Simon Willison · 4h ago

Hack Your Summer Launches Free Production Sprint for Students

Hack Your Summer is a free, four-week high-velocity production sprint designed for undergraduate students, graduate students, and recent graduates to build tangible, public-facing work. The initiative serves as an alternative to traditional internships amid a crisis of reduced internship availability in the US.

blog Simon Willison · 4h ago

Jon Udell: Human Agent in the loop

Jon Udell argues against the phrase "human in the loop" because it cedes authority to machines, proposing instead that humans should invite agents into their existing workflows as team members.

media r/LocalLLaMA · 4h ago

Neofold, the idle creature-collector with infinite pets thanks to a local diffusion model, released this week

Neofold is an idle creature-collector game that utilizes a local diffusion model to generate an infinite variety of pets. The title was recently released and is available on Steam.

arxiv arXiv cs.LG · 5h ago

Diffusion Models Adapt to Low-Dimensional Structure Under Flexible Coefficient Choices

This paper demonstrates that diffusion models' ability to exploit low-dimensional structure for accelerated sampling is a robust property independent of specific update coefficient choices. The authors prove that a broad class of coefficients allows generating an ε-accurate sample in O(k/ε) iterations, regardless of ambient dimension.

arxiv arXiv cs.LG · 5h ago

Dynamic estimation of slowly varying sequences

This article introduces a framework for sequentially approximating functions in slowly-varying sequences, leveraging the reuse of past queries to reduce overall computational cost. The authors present novel sequential estimation results for matrix powers, spectral densities, Monte Carlo integration, and partial differential equation boundary value problems.

arxiv arXiv cs.LG · 5h ago

Action-BED: Task-Driven Bayesian Experimental Design with Singly Intractable Objectives

The article introduces Action-BED, a new framework for Bayesian experimental design that formulates the problem in terms of expected future loss on downstream actions rather than uncertainty reduction. This approach converts traditionally doubly intractable objectives into singly intractable ones that can be jointly optimized using stochastic gradients.

arxiv arXiv cs.LG · 5h ago

MAS-PromptBench: When Does Prompt Optimization Improve Multi-Agent LLM Systems?

This study systematically investigates the impact of system-prompt optimization on multi-agent systems (MAS) by benchmarking two optimizers across diverse configurations of tasks, workflows, and team sizes.

arxiv arXiv cs.LG · 5h ago

On the Limits of Prompt-Conditioned Language Models as General-Purpose Learners

This paper argues that Large Language Models are not universal problem solvers through prompting alone, due to fundamental constraints in language as a communication interface and alignment requirements. The authors analyze user-system interaction as a cheap-talk game to derive PAC-Bayes bounds distinguishing estimation error from structural limitations.

arxiv arXiv cs.LG · 5h ago

Tapered Language Models: Improving Performance via Depth-Aware Capacity Allocation

The article introduces Tapered Language Models (TLMs), an architectural principle that allocates more parameter capacity to earlier layers and less to later layers within a fixed budget. This approach challenges the standard practice of uniform layer width by leveraging evidence that later layers primarily refine the residual stream rather than transforming it.

arxiv arXiv cs.LG · 5h ago

PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment

This study introduces PsyBridge, a hybrid intelligent framework designed to address the limitations of isolated mental health screening tools by integrating clinically validated assessments with cognitive and personality profiling. The system utilizes a modular architecture and weighted aggregation mechanism to generate interpretable risk classifications and decision support recommendations.

arxiv arXiv cs.LG · 6h ago

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

This article addresses the lack of rigorous convergence theory for the AdamW optimizer in regimes with heavy-tailed stochastic gradient noise, which is common in large language model pretraining. It questions whether AdamW can converge under these conditions or if its second-moment accumulator creates a genuine obstruction.

arxiv arXiv cs.LG · 6h ago

Semantic Browsing: Controllable Diversity for Image Generation

This article introduces Semantic Browsing, a method for generating controlled diversity in text-to-image models by enforcing structure on generated samples to overcome the lack of meaningful variation in current systems. The approach induces diversity directly at the text level rather than relying on stochastic variations within the model.

media r/LocalLLaMA · 6h ago

User implements C++ tool execution with MiMo-V2.5-GGUF

A user successfully utilized the MiMo-V2.5-GGUF model to write a built-in llama.cpp tool for executing C++ code and retrieving results. The implementation was achieved using opencode, where the model generated the necessary code based on specific instructions.

media r/LocalLLaMA · 7h ago

Why so many trash fine-tuned models on HuggingFace?

The author observes that the majority of fine-tuned models uploaded to Hugging Face perform worse than their base counterparts, rendering them useless. This proliferation is attributed to individuals using these models as a form of professional credentialing to secure high-paying AI positions.

github llama.cpp · 7h ago

llama.cpp b9835 release with UI stop and reasoning skip fixes

The llama.cpp project has released version b9835, which includes a fix for the stop and reasoning skip functionality in single-model mode. This update addresses specific issues within the user interface to improve control during model inference.

media r/LocalLLaMA · 8h ago

Script to monitor llama cpp and analyze memory usage

A user has shared a Bash script designed to parse the verbose output of llama.cpp, providing a clear summary of VRAM/RAM requirements and runtime performance metrics. This tool addresses the difficulty of predicting memory needs for various model quantizations by grouping buffer allocations by function and backend.

media r/LocalLLaMA · 8h ago

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

This article reports on an update to the Ornith-1.0-35B model, featuring a native MTP draft head grafted onto the IQ4_XS body for self-speculative decoding in llama.cpp. The author provides comprehensive performance metrics including throughput, time-to-first-token (TTFT), and long-context capabilities on a single RTX PRO 6000 Blackwell GPU.

media r/LocalLLaMA · 9h ago

Apple Refurbished Adds M5 Pro and Max Options

Following Apple's recent price increase, the company has added numerous top-of-the-line 14-inch MacBook Pro models equipped with M5 Pro and M5 Max chips to its refurbished store.

media r/LocalLLaMA · 9h ago

China Has Matched Anthropic in Cybersecurity, Resetting AI Race

A Wall Street Journal report indicates that Chinese artificial intelligence models have achieved parity with Anthropic's Claude in cybersecurity tasks.