All articles — korshunov.ai

All articles Page 1 / 124

When Top-1 Fails: Calibrating LoRA Monitors for Masked Diffusion LMs

This study evaluates the effectiveness of top-1 argmax concentration as a collapse warning during the fine-tuning of discrete diffusion language models (DLMs) using Low-Rank Adaptation (LoRA). The authors find that this metric has zero precision because it saturates before optimization begins, failing to detect actual training collapses.

arxiv arXiv cs.LG · 5h ago

Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning

Researchers introduce the Holistic Data Scheduler (HDS), a novel online data mixing framework that addresses the limitations of existing methods by considering dynamic data composition from multiple dimensions. HDS formulates data scheduling as a reinforcement learning problem using the Soft Actor-Critic algorithm and a multi-objective reward function.

arxiv arXiv cs.LG · 5h ago

TR-CIE Sampler for Discrete Flow Matching

Researchers propose the Time-Reparameterized Cumulative Intensity Extrapolation (TR-CIE) sampler to improve sampling quality in discrete flow matching when function evaluations are restricted. The method combines schedule-based time reparameterization with a cumulative-intensity extrapolation updating rule to mitigate stiffness and improve approximation accuracy.

arxiv arXiv cs.LG · 5h ago

AsyncOPD: How Stale Can On-Policy Distillation Be?

This article presents AsyncOPD, a fully asynchronous on-policy distillation pipeline that decouples rollout generation from learner updates to alleviate training bottlenecks in large language model post-training. The authors provide the first systematic study of staleness effects in this context, demonstrating that teacher-weighted forward KL is robust to stale rollouts while student-weighted reverse KL is vulnerable.

media r/LocalLLaMA · 5h ago

Krea-2-Turbo Image Model - Easy to be fully uncensored, but it can also EDIT Images!

The Krea-2-Turbo model generates high-quality images in approximately three seconds and supports image editing through masking despite being a text-to-image architecture.

blog Simon Willison · 5h ago

HTML table extractor

The HTML table extractor is a paste-conversion tool that accepts rich text containing embedded HTML tables and converts them into various formats. It supports outputting detected tables as HTML, Markdown, CSV, TSV, or JSON.

media Hugging Face Forums · 5h ago

Open-Source Bilingual Guide on Transformer Mechanics Published

An open-source, bilingual guide in English and Spanish detailing the inner workings of Transformers has been published. The resource covers the exact mathematics and mechanics behind attention collapse and KV-cache compression.

media Hugging Face Forums · 5h ago

[Research] From Functional Geometry to Dynamic Grammar: New LIMEN Audits (V23–V24) Across 7 Architectures

Independent research project LIMEN analyzes the internal dynamics of seven open-source Transformer models, revealing that semantic ambiguity alters trajectory geometry and uncovering a universal dynamic grammar across architectures.

lab Microsoft Research Blog · 5h ago

Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity

Microsoft Research introduces Memora, a scalable agentic memory framework designed to balance abstraction and specificity for long-horizon AI tasks. The system decouples rich memory content from lightweight retrieval structures, setting new state-of-the-art results on benchmarks while using up to 98% fewer context tokens.

arxiv arXiv cs.LG · 6h ago

Autonomous Video Generation with Counterfactual Controllability for Self-Evolving World Models

The article argues that current video generation models learn only partial, implicit spatiotemporal world models rather than fully grounded or controllable ones. It asserts that predictive realism alone is insufficient for creating physical agents because these models often fail to identify controllable variables and embodiment constraints.

arxiv arXiv cs.LG · 6h ago

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

The authors introduce BehaviorBench, a comprehensive benchmark designed to evaluate foundation models across diverse behavioral science tasks and populations. The study assesses four core capabilities—behavior prediction, strategic decision-making, subject-trait inference, and behavioral knowledge application—at both individual and distributional levels.

arxiv arXiv cs.LG · 6h ago

A Pāninian Foundation for Indic Language Processing

The article argues that natural language processing infrastructure for the billion-plus speakers of Indic languages is fragmented due to a lack of shared structural foundations. It proposes leveraging the morphosyntactic architecture formalized in Pānini's Astādhyāyī as a unifying computational framework to improve accuracy and data efficiency.

arxiv arXiv cs.LG · 6h ago

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

This study benchmarks traditional machine learning methods against lightweight transformer architectures for binary fault detection across three public datasets, evaluating tradeoffs between accuracy, model size, and latency. The research assesses classification performance using F1-score and AUC, while also testing INT8 dynamic quantization and a two-stage adaptive inference pipeline to optimize deployment on resource-constrained hardware.

arxiv arXiv cs.LG · 6h ago

Project Ariadne: Prompt-Conditioned Route Generation for Synthesis Planning

Researchers introduce Ariadne, a decoder-only model that reframes retrosynthetic planning as prompt-conditioned sequence generation, allowing target molecules, constraints, and routes to be represented in a single sequence. This approach eliminates the need for separate models tailored to specific planning specifications.

arxiv arXiv cs.LG · 6h ago

Automated Residual Plot Assessment With the R Package autovi and the Shiny Application autovi.web

The article introduces an R package and a Shiny application designed to automate the visual assessment of residual plots for linear models, addressing the scalability and consistency issues inherent in manual evaluation.

media r/LocalLLaMA · 6h ago

shoutout to /u/TheDankestSlav for this gem

This Reddit post from r/LocalLLaMA is a simple shoutout to user /u/TheDankestSlav. It links to an image shared by the user, which is described as a "gem".

media r/LocalLLaMA · 6h ago

Reddit user criticizes Dario Amodei's claims about open source AI

A Reddit user argues that Anthropic CEO Dario Amodei fundamentally misunderstands how open-source AI models work, specifically refuting his recent congressional testimony from June 28, 2026. The author contends that Amodei's assertions regarding transparency and accessibility are factually incorrect based on the current state of open-weight models.

lab Claude Code Releases · 6h ago

Claude Code v2.1.196 Release Notes

Claude Code version 2.1.196 introduces organization default models, clickable file attachments, and improved security for MCP server approvals. The update also enhances background session reliability, fixes various agent status reporting issues, and optimizes token usage in code review workflows.

arxiv arXiv cs.LG · 7h ago

MotifGen: Spatiotemporal interpolation of misaligned satellite images via multi-source generative modeling

Researchers introduce MotifGen, a generative model designed for the spatiotemporal interpolation of tropical cyclone microwave images from multiple geospatial sources with irregular time intervals and geographic misalignment. The model addresses the challenge of high heterogeneity in microwave data by combining inputs from various instruments to fill gaps caused by long satellite revisit times.

arxiv arXiv cs.LG · 7h ago

Deep numerical schemes for systems of Ergodic BSDEs with applications to regime-switching forward utilities

This paper introduces two neural-network-based numerical schemes for solving systems of coupled ergodic Backward Stochastic Differential Equations (eBSDEs), motivated by approximating optimal strategies in regime-switching stochastic factor models.