All articles — korshunov.ai

All articles Page 1 / 120

video-SALMONN-R3: Efficient Video Understanding via Reinforcement Learning

The paper introduces video-SALMONN-R$^3$, an end-to-end video large language model that enables efficient re-watching of video segments through reinforcement learning without relying on chain-of-thought data. This approach addresses the computational and memory constraints that typically force models to use reduced frame rates and spatial resolutions.

arxiv arXiv cs.AI · 6h ago

Adaptive Machine Learning Framework for UAV Trajectory Optimization in O-RAN

This paper introduces a novel framework for optimizing unmanned aerial vehicle (UAV) trajectories in 6G cellular systems by integrating enhanced continual transfer learning within the O-RAN architecture. The system utilizes a library of pre-trained models and a selection mechanism to minimize adaptation time when operating in dynamic environments.

arxiv arXiv cs.AI · 6h ago

RetiSEM: Generalising Causal Models for Fragmented Biomedical Data

The authors propose RetiSEM, a domain-constrained structural equation modelling framework designed to recover causal graphs and perform mediation analysis using fragmented biomedical data with limited multimodal resources. The method organizes variables into biologically informed blocks and applies forbidden-edge constraints to decompose pathway-level effects.

arxiv arXiv cs.AI · 6h ago

Red-Teaming the Agentic Red-Team

This work presents the first in-depth security analysis of widely used agentic systems for offensive security operations, revealing common design flaws that allow adversaries to exfiltrate API keys and compromise operator machines even within sandboxes.

arxiv arXiv cs.AI · 6h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that disaggregates FFN weights and KV-cache into separate GPU memory pools to address memory inefficiencies in sparse request scenarios. By consolidating static weights and dynamically provisioning active KV-cache demand, the system aims to improve GPU memory utilization and support bursty long-context requests.

media r/LocalLLaMA · 6h ago

HuiHui abliterated model outperforms vanilla 3.6-35B-a3b on math and code

A custom quantization recipe applied to the HuiHui abliterated model demonstrates superior performance compared to the vanilla 3.6-35B-a3b variant in mathematics and coding tasks. The results suggest that removing refusal mechanisms allows the model to achieve greater accuracy and wisdom in these domains.

media r/LocalLLaMA · 6h ago

Amodei: "Open Source Models Will Eat Your Children"

This Reddit post shares an image featuring the quote "Open Source Models Will Eat Your Children" attributed to Amodei. The content consists of a link to the image and a link to the associated comment thread on r/LocalLLaMA.

media r/LocalLLaMA · 6h ago

Anthropic's Amodei: Open Source Models Could Be Dangerous

Dario Amodei, CEO of Anthropic, has expressed concerns that open source AI models could lead to dangerous outcomes. The statement highlights the potential risks associated with unrestricted access to advanced artificial intelligence technologies.

arxiv arXiv cs.AI · 7h ago

On the Smallness of the Large Language Models Scaling Exponents

The article discusses reasons why the scaling exponents of current Large Language Model applications indicate an unsustainable regime regarding energy resources.

arxiv arXiv cs.AI · 7h ago

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

This study conducts a rigorous reevaluation of nine recent Graph Foundation Models (GFMs) for node property prediction, comparing them against strong Graph Neural Network (GNN) baselines to address the lack of unified evaluation standards in the field.

arxiv arXiv cs.AI · 7h ago

RaDaR: A specialized reasoning LLM for accelerating rare disease diagnosis

Researchers present RaDaR, an open-source 32B parameter reasoning large language model designed to accelerate the diagnosis of rare diseases by addressing challenges in clinical deployability and data scarcity. The model was trained on nearly 50,000 public cases and over 100,000 synthetic cases, demonstrating superior performance across benchmarks and external validation centers.

arxiv arXiv cs.AI · 7h ago

Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation

The authors propose a reinforcement learning fine-tuning framework that utilizes autonomous vision-language evaluation as a scalable supervision signal for GUI agents, eliminating the need for manual labels or task-specific heuristics. By treating evaluator feedback as a noisy binary reward channel and deriving a noise-corrected estimator for Proximal Policy Optimization, the method addresses the difficulty of obtaining machine-readable rewards in open-ended desktop environments.

arxiv arXiv cs.AI · 7h ago

AdversaBench: Automated LLM Red-Teaming with Multi-Judge Confirmation and Cross-Model Transferability

The authors present AdversaBench, an end-to-end red-teaming pipeline that generates hard inputs for large language models using five structured mutation operators and confirms failures through a three-judge panel with a meta-judge tiebreaker.

media r/LocalLLaMA · 7h ago

Samsung, SK hynix, Micron Sued in US Over Memory Price Fixing

A lawsuit has been filed in the United States against major memory chip manufacturers Samsung, SK hynix, and Micron regarding allegations of price fixing.

blog Simon Willison · 7h ago

Ornith-1.0: Self-Scaffolding LLMs for Agentic Coding

DeepReinforce has released Ornith-1.0, an open-weight model licensed under MIT that achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks. The model is built upon pretrained Gemma 4 and Qwen 3.5 foundations and includes variants with 9B Dense, 31B Dense, 35B MoE, and 397B MoE parameter counts.

media r/LocalLLaMA · 7h ago

Arxiv Paper on Hold for 2 months.

A researcher submitting their first paper to arXiv reports that the manuscript has been under moderator review for two months despite passing automatic qualification checks. The author inquires whether this delay is normal and asks for advice on whether to resubmit or continue waiting.

github llama.cpp · 7h ago

llama.cpp b9842 release: dedup preset and cached model entries in /v1/models

The llama.cpp b9842 release introduces a change to deduplicate preset and cached model entries in the /v1/models endpoint. This update is signed off by Adrien Gallouët from Hugging Face.

arxiv arXiv cs.AI · 8h ago

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

This research investigates the use of large language models to detect scam phone calls in Turkish, a low-resource language where annotated data is scarce. The study introduces the first public multi-modal dataset containing 100 aligned audio-transcript pairs of scam and benign conversations.

arxiv arXiv cs.AI · 8h ago

Governed Shared Memory for Multi-Agent LLM Systems

This paper formalizes the fleet-memory problem in multi-agent LLM environments, identifying four foundational failure modes: unauthorized leakage, stale propagation, contradiction persistence, and provenance collapse. To address these issues, the authors define explicit systems-level primitives including scoped retrieval, temporal supersession, provenance tracking, and policy-governed memory propagation.

arxiv arXiv cs.AI · 8h ago

Quant Convergence: Bridging Classical Value Investing and Modern Factor Models

This research tests whether Benjamin Graham's classic value investing rules can act as a mathematical filter to prevent complex machine learning models from memorizing market noise. The study compares pure Graham rules, modern factors, and a combination of both against XGBoost and AutoGluon models using 20 years of S&P 500 data.