All articles — korshunov.ai

All articles Page 1 / 120

RaDaR: A specialized reasoning LLM for accelerating rare disease diagnosis

Researchers present RaDaR, an open-source 32B parameter reasoning large language model designed to accelerate the diagnosis of rare diseases by addressing challenges in clinical deployability and data scarcity. The model was trained on nearly 50,000 public cases and over 100,000 synthetic cases, demonstrating superior performance across benchmarks and external validation centers.

arxiv arXiv cs.AI · 5h ago

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

The authors propose a reinforcement learning fine-tuning framework that utilizes autonomous vision-language evaluation as a scalable supervision signal for GUI agents, eliminating the need for manual labels or task-specific heuristics. By treating evaluator feedback as a noisy binary reward channel and deriving a noise-corrected estimator for Proximal Policy Optimization, the method addresses the difficulty of obtaining machine-readable rewards in open-ended desktop environments.

arxiv arXiv cs.AI · 5h ago

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

The authors present AdversaBench, an end-to-end red-teaming pipeline that generates hard inputs for large language models using five structured mutation operators and confirms failures through a three-judge panel with a meta-judge tiebreaker.

media r/LocalLLaMA · 5h ago

Samsung, SK hynix, Micron Sued in US Over Memory Price Fixing

A lawsuit has been filed in the United States against major memory chip manufacturers Samsung, SK hynix, and Micron regarding allegations of price fixing.

blog Simon Willison · 5h ago

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

DeepReinforce has released Ornith-1.0, an open-weight model licensed under MIT that achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. The model is built upon pretrained Gemma 4 and Qwen 3.5 foundations and includes variants with 9B Dense, 31B Dense, 35B MoE, and 397B MoE parameter counts.

media r/LocalLLaMA · 5h ago

Arxiv Paper on Hold for 2 months.

A researcher submitting their first paper to arXiv reports that the manuscript has been under moderator review for two months despite passing automatic qualification checks. The author inquires whether this delay is normal and asks for advice on whether to resubmit or continue waiting.

github llama.cpp · 5h ago

llama.cpp b9842 release: dedup preset and cached model entries in /v1/models

The llama.cpp b9842 release introduces a change to deduplicate preset and cached model entries in the /v1/models endpoint. This update is signed off by Adrien Gallouët from Hugging Face.

arxiv arXiv cs.AI · 6h ago

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

This research investigates the use of large language models to detect scam phone calls in Turkish, a low-resource language where annotated data is scarce. The study introduces the first public multi-modal dataset containing 100 aligned audio-transcript pairs of scam and benign conversations.

arxiv arXiv cs.AI · 6h ago

Governed Shared Memory for Multi-Agent LLM Systems

This paper formalizes the fleet-memory problem in multi-agent LLM environments, identifying four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these issues, the authors define explicit systems-level primitives including scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation.

arxiv arXiv cs.AI · 6h ago

Quant Convergence: Bridging Classical Value Investing and Modern Factor Models

This research tests whether Benjamin Graham's classic value investing rules can act as a mathematical filter to prevent complex machine learning models from memorizing market noise. The study compares pure Graham rules, modern factors, and a combination of both against XGBoost and AutoGluon models using 20 years of S&P 500 data.

arxiv arXiv cs.AI · 6h ago

Overrefusal from Small On-Premises LLMs in Criminal Legal Context

A study investigates the impact of overrefusal on small, on-device large language models when processing legal prompts, finding that authority-style prefixes systematically increase refusal rates by 2 to 20 times compared to a no-prefix baseline. While role-play jailbreak prefixes showed mixed effects across different models, the results indicate that these small LLMs are unstable under contextual framings typical of real institutional users.

arxiv arXiv cs.AI · 6h ago

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

This paper introduces ASALT, a method for lateral transfer learning in multi-agent reinforcement learning that accommodates mismatched state-space dimensionalities between source and target domains. The approach uses observation-level and state-level adapters to map inputs into a shared embedding space, enabling effective knowledge transfer across heterogeneous environments.

media r/LocalLLaMA · 6h ago

Dual GPU Value: Parallelism Over Model Size for Local LLMs

The author argues that upgrading from a single to dual GPU offers greater benefits through parallel processing rather than enabling the use of larger, higher-quality model quantizations. For coding tasks, the quality difference between Q4 and Q6/Q8 quantizations is minimal, making increased context window and throughput more valuable.

media r/LocalLLaMA · 6h ago

Effect of GLM 5.2 !!

A Reddit user shared an image titled "Effect of GLM 5.2 !!" in the r/LocalLLaMA subreddit.

media r/LocalLLaMA · 6h ago

Proposing a unified open dataset instead of decentralized LLM training

The author argues that the open-source community should prioritize building a massive, high-quality pre-training dataset rather than attempting to coordinate decentralized LLM training across home GPUs. This shift is presented as a more practical and immediate response to recent government bans on commercial frontier models and a scarcity of small-to-medium open-weight releases.

media r/LocalLLaMA · 6h ago

Bolt Graphics GPU to feature 2 DDR5 laptop DIMM slots

Bolt Graphics is developing a GPU that includes two DDR5 SODIMM slots for overflow memory, aiming for full production by Christmas 2027. The company has working prototypes and targets creators as its initial audience.

arxiv arXiv cs.AI · 7h ago

Uncertainty-Aware Longitudinal Forecasting of Alzheimer's Disease Progression Using Deep Learning

This study proposes a probabilistic framework for longitudinal modeling of Alzheimer's disease progression that combines ordinal diagnosis prediction, multi-horizon trajectory generation, and decomposed uncertainty estimation. The approach utilizes a Temporal Fusion Transformer encoder and an autoregressive Mixture Density Network to generate five-year probabilistic trajectories while quantifying both aleatoric and epistemic uncertainty.

arxiv arXiv cs.AI · 7h ago

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

The paper introduces ScaleToT, a method that learns structured reasoning from a small subset of users and extends it to billions of low-activity users with sparse profiles. It combines a bounded entropy-guided Tree-of-Thought refinement with supervised fine-tuning and reward policy optimization to transfer reasoning capabilities without full LLM inference.

arxiv arXiv cs.AI · 7h ago

Abstractions of Queries in Ontology-Based Data Access

This article addresses query abstraction in ontology-based data access (OBDA) by translating data queries to the ontology layer using existential rules and certain answer semantics.

arxiv arXiv cs.AI · 7h ago

When CQs Go Wrong: Challenges in CQ Verification with OE-Assist

This paper investigates the challenges of Competency Question (CQ) verification, a process where ontologies are evaluated against natural language questions to ensure proper modeling. The authors analyze why CQs become difficult and how an LLM assistant can support users during this evaluation.