All articles — korshunov.ai

All articles Page 1 / 123

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability

The authors introduce Age of LLM, a turn-based 1v1 benchmark where two large language models compete on a 13x7 grid to destroy an enemy base under conditions of fog of war and full diplomacy. This private engine mitigates data contamination by using fresh random map seeds and opponents for each match.

arxiv arXiv cs.AI · 8h ago

ATRIA: Adaptive Traceable ECG Reporting with Iterative Agents

The article introduces ATRIA, a multi-agent system for ECG reporting that addresses the limitations of existing end-to-end models and single-pass agents by mirroring the clinician's iterative workflow.

arxiv arXiv cs.AI · 8h ago

Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders

This study evaluates whether any single decoding pipeline dominates across subjects in motor imagery brain-computer interfaces by testing 1,056 configurations on three public datasets using rigorous statistical benchmarks.

arxiv arXiv cs.AI · 8h ago

Entity Resolution via Batched Oracle Queries

This article addresses the problem of resolving entities in large datasets using an oracle that clusters records in limited batches, aiming for a pay-as-you-go approach to control costs while maximizing recall.

arxiv arXiv cs.AI · 8h ago

Agentic AI for Bilevel Long-Term Optimization of Policy-Driven Physical Layer Systems

This paper introduces Agentic-LTPO, a nested bilevel optimization framework designed to address the limitations of fixed-objective methods in physical layer systems facing dynamic operator policies and real-time constraints. The framework utilizes agentic AI to generate upper-level configurations that translate evolving policies and historical experiences into structured lower-level problems for immediate decision-making.

media r/LocalLLaMA · 8h ago

Second Circuit: An NGO for digital freedom of thought

Chris Tidesson announces the founding of Second Circuit, an NGO dedicated to supporting self-determined AI use and encouraging open-source software adoption among governments, companies, and private individuals. The organization was originally established in response to the ChatGPT 4o situation and currently operates a Discord community for over six months.

media r/LocalLLaMA · 8h ago

on Dario’s statement

This Reddit post from the r/LocalLLaMA community discusses a statement made by Dario Amodei. The content is limited to the title and metadata, with no detailed text or analysis provided in the source.

arxiv arXiv cs.AI · 9h ago

Can Aggregate Invariants Accelerate Continuous Subgraph Matching? Limits, Laws, and a Dynamic Spectral Index

This study evaluates whether spectral filtering can accelerate continuous subgraph matching (CSM) on dynamic graphs, finding that while lazy maintenance is ineffective, selective exact maintenance offers significant performance gains.

arxiv arXiv cs.AI · 9h ago

Detecting AI Coding Agents in Open Source: A Validated Multi-Method Census of 180 Million Repositories

A multi-layered detection framework analyzing 180 million Git repositories reveals that single-signal methods significantly underestimate the prevalence of generative AI coding agents, missing up to 97% of activity. The study identifies over 320,000 commits per month from agents like Claude Code, which dominates silent adoption through configuration files rather than bot accounts.

arxiv arXiv cs.AI · 9h ago

Transformation Behavior of Images in Latent Space

This paper investigates how classical image transformations affect embeddings in latent space using encoder networks from Lunit Inc., Bioptimus, and Meta Research Team.

arxiv arXiv cs.AI · 9h ago

MedPCFM: Improving Medical Point Cloud Completion by Integrating Point Transformers and Flow Matching

This article introduces PCFM, a flow matching approach for medical point cloud completion that integrates Point Transformer v3 (PTv3) to address insufficiently studied generative modeling in this domain. The method is evaluated on the SkullFix, SkullBreak, and Mandibular Defect datasets against strong deterministic and diffusion baselines.

arxiv arXiv cs.AI · 9h ago

ReM-MoA: Reasoning Memory Sustains Mixture-of-Agents Scaling

The authors propose ReM-MoA, a memory-augmented Mixture-of-Agents framework designed to sustain performance gains as model depth increases, addressing the degradation and saturation issues found in existing variants. The system utilizes a Ranked Reasoning Memory and a Curated Diversified Memory Routing scheme to preserve exploration diversity while propagating high-quality reasoning traces across layers.

arxiv arXiv cs.AI · 9h ago

NoContactNoWorries: Estimating Contact through Vision and Proprioception for In-Hand Dexterous Manipulation

Researchers propose NoContactNoWorries, a transformer-based framework that infers binary contact states during in-hand manipulation by fusing RGB-D vision with robot proprioception. This approach serves as a scalable pseudo-tactile signal, avoiding the cost and fragility associated with dedicated hardware tactile sensors.

arxiv arXiv cs.AI · 9h ago

Bayesian control for coding agents

This article introduces a Bayesian controller for orchestrating modern coding agents, addressing the limitations of fixed-rule systems that ignore uncertainty during tool use.

media r/LocalLLaMA · 9h ago

What happened to Petals (Decentralized Inference) by BigScience?

The provided source content is a Reddit submission link and does not contain the article text or discussion details.

media r/LocalLLaMA · 9h ago

Reddit user suggests OpenAI release GPT-OSS-2 to counter Anthropic IPO

A Reddit user proposes that OpenAI should launch a powerful open-source model, referred to as GPT-OSS-2, timed with Anthropic's upcoming IPO.

media r/LocalLLaMA · 9h ago

Qwen3-tts.cpp and Compose Desktop GUI for Local TTS

A developer has released an optimized C++ implementation of Qwen3-TTS, achieving approximately 5x realtime speed on an RTX 5080, alongside a cross-platform desktop GUI built with Kotlin Compose Multiplatform. The project provides GGML-based inference that supports both CPU and CUDA execution on Windows and Linux.

arxiv arXiv cs.AI · 10h ago

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

A study quantifies the structural tokenization penalty faced by African languages in commercial large language models, revealing that speakers pay higher costs and experience greater latency due to inefficient subword token assignment. Across 20 African languages and 11 frontier tokenizers, every tested language incurs a premium over English, with median costs reaching 1.88 times that of English and up to 8.92 times for N'Ko script.

arxiv arXiv cs.AI · 10h ago

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

The authors propose CompressKV, a framework that compresses key-value caches in GQA-based large language models by identifying semantic retrieval heads to retain critical tokens. This approach addresses the performance degradation caused by existing heuristic eviction methods that ignore the distinct functionalities of attention heads.

blog Simon Willison · 10h ago

Count the number of Safari tabs

This article shares a concise method for counting open browser tabs in Safari using AppleScript. The provided command executes via the terminal to retrieve the total count across all windows.