All articles — korshunov.ai

All articles Page 1 / 121

CompressKV: Semantic-Retrieval-Guided KV-Cache Compression for Resource-Efficient Long-Context LLM Inference

The authors propose CompressKV, a framework that compresses key-value caches in GQA-based large language models by identifying semantic retrieval heads to retain critical tokens. This approach addresses the performance degradation caused by existing heuristic eviction methods that ignore the distinct functionalities of attention heads.

blog Simon Willison · 6h ago

Count the number of Safari tabs

This article shares a concise method for counting open browser tabs in Safari using AppleScript. The provided command executes via the terminal to retrieve the total count across all windows.

media r/LocalLLaMA · 6h ago

DeepSeek V4 PR merged into llama.cpp

A pull request supporting DeepSeek V4 has been merged into the llama.cpp repository, enabling users to run the model locally.

media r/LocalLLaMA · 6h ago

Proposed components for a comprehensive local AI offline backup kit

A Reddit user outlines a comprehensive list of software and models to store offline for maintaining access to local AI capabilities in the event of widespread internet restrictions or bans. The proposed kit focuses on preserving essential tools, operating systems, and model weights to ensure functionality without external dependencies.

media Hugging Face Forums · 6h ago

Project UCTF: An Open Research Program on Machine-Native AI Training Representations

Project UCTF has been restructured from a single proposal into an open, hypothesis-driven research program to investigate whether machine-native intermediate representations can reduce cross-lingual semantic redundancy in multilingual AI training.

media Hugging Face Forums · 6h ago

Error Generating Deep RL Course Certificate

A user reports encountering an error while attempting to generate a certificate of completion for the Deep RL course on Hugging Face. The issue persists despite entering the required username and name details, with no existing guidance available online.

lab Hugging Face Blog · 6h ago

DiScoFormer: One transformer for density and score, across distributions

The article introduces DiScoFormer, a unified transformer model capable of performing both density estimation and score-based generation tasks across various data distributions.

lab Google — The Keyword (AI) · 6h ago

Ask an AI expert: What exactly is the full stack?

A Google expert explains the concept of taking a full-stack approach to artificial intelligence. The article highlights that this comprehensive methodology has served as the foundation for Google's AI work for an extended period.

arxiv arXiv cs.AI · 7h ago

The Latent Bridge: A Continuous Slow-Fast Channel for Real-Time Game Agents

This article introduces a continuous Latent Bridge that couples frozen reactive and reasoning vision-language models to enable real-time game agents with millisecond latency and long-horizon planning. By projecting the slow model's residuals into the fast model's input-embedding space, it avoids text round-trips while matching or beating traditional Text Bridges in performance.

arxiv arXiv cs.AI · 7h ago

G$^3$VLA: Geometric inductive bias for Vision-Language-Action Models

The authors propose G$^3$VLA, a camera-aware geometric module that injects calibrated structure into the visual-token stream of pretrained Vision-Language-Action models without altering their action space or imitation objective. This approach combines intrinsic-conditioned ray embeddings, projective positional encoding, and bidirectional cross-view fusion to address the mismatch between 2D image coordinates and robot camera geometry.

arxiv arXiv cs.AI · 7h ago

video-SALMONN-R3: Efficient Video Understanding via Reinforcement Learning

The paper introduces video-SALMONN-R$^3$, an end-to-end video large language model that enables efficient re-watching of video segments through reinforcement learning without relying on chain-of-thought data. This approach addresses the computational and memory constraints that typically force models to use reduced frame rates and spatial resolutions.

arxiv arXiv cs.AI · 7h ago

Adaptive Machine Learning Framework for UAV Trajectory Optimization in O-RAN

This paper introduces a novel framework for optimizing unmanned aerial vehicle (UAV) trajectories in 6G cellular systems by integrating enhanced continual transfer learning within the O-RAN architecture. The system utilizes a library of pre-trained models and a selection mechanism to minimize adaptation time when operating in dynamic environments.

arxiv arXiv cs.AI · 7h ago

RetiSEM: Generalising Causal Models for Fragmented Biomedical Data

The authors propose RetiSEM, a domain-constrained structural equation modelling framework designed to recover causal graphs and perform mediation analysis using fragmented biomedical data with limited multimodal resources. The method organizes variables into biologically informed blocks and applies forbidden-edge constraints to decompose pathway-level effects.

arxiv arXiv cs.AI · 7h ago

Red-Teaming the Agentic Red-Team

This work presents the first in-depth security analysis of widely used agentic systems for offensive security operations, revealing common design flaws that allow adversaries to exfiltrate API keys and compromise operator machines even within sandboxes.

arxiv arXiv cs.AI · 7h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that disaggregates FFN weights and KV-cache into separate GPU memory pools to address memory inefficiencies in sparse request scenarios. By consolidating static weights and dynamically provisioning active KV-cache demand, the system aims to improve GPU memory utilization and support bursty long-context requests.

media r/LocalLLaMA · 7h ago

HuiHui abliterated model outperforms vanilla 3.6-35B-a3b on math and code

A custom quantization recipe applied to the HuiHui abliterated model demonstrates superior performance compared to the vanilla 3.6-35B-a3b variant in mathematics and coding tasks. The results suggest that removing refusal mechanisms allows the model to achieve greater accuracy and wisdom in these domains.

media r/LocalLLaMA · 7h ago

Amodei: "Open Source Models Will Eat Your Children"

This Reddit post shares an image featuring the quote "Open Source Models Will Eat Your Children" attributed to Amodei. The content consists of a link to the image and a link to the associated comment thread on r/LocalLLaMA.

media r/LocalLLaMA · 7h ago

Anthropic's Amodei: Open Source Models Could Be Dangerous

Dario Amodei, CEO of Anthropic, has expressed concerns that open source AI models could lead to dangerous outcomes. The statement highlights the potential risks associated with unrestricted access to advanced artificial intelligence technologies.

arxiv arXiv cs.AI · 8h ago

On the Smallness of the Large Language Models Scaling Exponents

The article discusses reasons why the scaling exponents of current Large Language Model applications indicate an unsustainable regime regarding energy resources.

arxiv arXiv cs.AI · 8h ago

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

This study conducts a rigorous reevaluation of nine recent Graph Foundation Models (GFMs) for node property prediction, comparing them against strong Graph Neural Network (GNN) baselines to address the lack of unified evaluation standards in the field.