Neural Networks as Linear Regression: An Introduction for Statisticians
This article introduces neural networks to statisticians by demystifying the field through the lens of linear regression approximation.
This article introduces neural networks to statisticians by demystifying the field through the lens of linear regression approximation.
Researchers propose a scalable framework for merging independently trained billion-parameter transformers using linear mode connectivity, addressing scalability limits in existing methods. The approach employs function-preserving weight transformations and a dual learning procedure where both models jointly optimize toward a shared linear interpolation path.
The article argues against using large language models to infer causal structures, warning that such approaches risk confusing textual associations with genuine causal evidence. Instead, it proposes that agents should only assist the workflow by inspecting data and explaining assumptions, while leaving causal claims grounded in formal algorithms and diagnostics.
A Reddit user demonstrates running the Qwen3.6-27B model quantized to Q3 with KV at Q8 on an AMD Mi50 32GB GPU, achieving approximately 180+ tokens per second for prompt processing and 9 tokens per second for text generation.
A developer has created a game-agnostic NPC engine backend that leverages small local models to achieve fast response times and decent quality for role-playing games. The system utilizes NVIDIA Parakeet 0.6 for speech-to-text, Gemma 4 26B A4B as the LLM, and Qwen3-TTS for voice synthesis.
A user reports testing tensor split mode with two Morefine G1 4090M 16GB eGPUs connected via Thunderbolt 3 at 40Gbps. While layer split mode yields high token rates for prefill (PP) and text generation (TG), tensor split mode saturates both cards during TG but suffers from poor PP performance due to bandwidth saturation.
The authors propose neural classification trees (NCT), a framework that achieves robustness by encoding subgroup structure within its tree-shaped architecture to address spurious correlations in machine learning models.
Researchers propose a novel bootstrapped method called Self-Filtering that trains a CLIP model on an evolving dataset selected through iterative self-filtering. This approach balances filtered, high-probability clean samples with diverse examples from the entire distribution to mitigate noise in large-scale vision-language datasets.
The authors propose Hedgementation, a new benchmark designed to evaluate machine learning models for mapping hedgerows from remote sensing data at a country scale with 10m² spatial resolution. This initiative combines and harmonizes multiple remote sensing products and ground truth labels derived from a French hedgerow inventory.
This paper proposes an active, continual learning paradigm for Vision-Language-Action (VLA) models to address the inefficiencies of passive imitation learning. The authors demonstrate that uncertainty-guided data collection improves fine-tuning efficiency but causes catastrophic forgetting when recovery data is used exclusively.
The article introduces DiT-Reward, a method that converts a pretrained text-to-image Diffusion Transformer into a reward model by processing near-clean image latents and aggregating text-conditioned representations across transformer layers. This approach leverages generative representations to evaluate the quality of generated images without requiring separate training objectives.
The article demonstrates that Muown's directional update is equivalent to a Riemannian step on normalized directions, where the un-normalized parameterization magnitude modulates the angular step size. This insight explains Muown's step-size stability and motivates the development of AngularMuown, which optimizes directly over normalized directions with an explicit, schedulable angular multiplier.
The authors propose a method to transform inherently sparse outcome rewards in reinforcement learning into dense process rewards by training a discriminator to distinguish between successful and unsuccessful episodes. This approach incentivizes the policy to match the state-action visitations of successful episodes while avoiding those of unsuccessful ones, providing dense feedback on progress without altering the optimal policy.
Hack Your Summer is a free, four-week high-velocity production sprint designed for undergraduate students, graduate students, and recent graduates to build tangible, public-facing work. The initiative serves as an alternative to traditional internships amid a crisis of reduced internship availability in the US.
Jon Udell argues against the phrase "human in the loop" because it cedes authority to machines, proposing instead that humans should invite agents into their existing workflows as team members.
Neofold is an idle creature-collector game that utilizes a local diffusion model to generate an infinite variety of pets. The title was recently released and is available on Steam.
This paper demonstrates that diffusion models' ability to exploit low-dimensional structure for accelerated sampling is a robust property independent of specific update coefficient choices. The authors prove that a broad class of coefficients allows generating an ε-accurate sample in O(k/ε) iterations, regardless of ambient dimension.
This article introduces a framework for sequentially approximating functions in slowly-varying sequences, leveraging the reuse of past queries to reduce overall computational cost. The authors present novel sequential estimation results for matrix powers, spectral densities, Monte Carlo integration, and partial differential equation boundary value problems.
The article introduces Action-BED, a new framework for Bayesian experimental design that formulates the problem in terms of expected future loss on downstream actions rather than uncertainty reduction. This approach converts traditionally doubly intractable objectives into singly intractable ones that can be jointly optimized using stochastic gradients.
This study systematically investigates the impact of system-prompt optimization on multi-agent systems (MAS) by benchmarking two optimizers across diverse configurations of tasks, workflows, and team sizes.