All articles — korshunov.ai

All articles Page 1 / 130

Lie-Algebra Attention: Group Element Tokens in Neural Networks

Lie-Algebra Attention introduces attention tokens as matrix Lie group elements, using the closed-form algebra norm of relative poses as attention scores. This method achieves invariant, equivariant attention without representation-theoretic components, outperforming vector-token baselines on SE(2), SO(3), and Aff(2) with fewer parameters and no learned kernels.

arxiv arXiv cs.LG · 13d ago

Deterministic Multicalibration with Optimal Sample Complexity

A new algorithm achieves minimax-optimal sample complexity for multicalibration using deterministic predictors, resolving a long-standing open problem. The method also produces deterministic predictors that satisfy outcome indistinguishability and enables optimal deterministic omnipredictors and panpredictors, addressing open questions from prior works.

arxiv arXiv cs.LG · 13d ago

UNIEGO: Proxy-Mediated Unified Egocentric Video Representation

UNIEGO introduces a hierarchical multi-teacher distillation framework that uses proxy models to mediate knowledge transfer from nine diverse teachers across viewpoints and modalities. The Selective Proxy Distillation (SPD) stage adaptively selects reliable proxies during training, improving representation quality and stability. UNIEGO achieves state-of-the-art results in action recognition, video retrieval, and action segmentation on ego-exo benchmarks.

arxiv arXiv cs.LG · 13d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token changes, though case studies reveal novel phenomena like non-chronological reasoning and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.CL · 13d ago

RefRad2D Dataset Enables Scalable Spatial Grounding in Radiology

RefRad2D is a large-scale bilingual dataset of 1.2M CT and MR image-text pairs from clinical practice. Trained on this data, RadGrounder achieves competitive VQA results and performs spatial grounding without degrading language quality, enabling verifiable outputs in radiology.

arxiv arXiv cs.CL · 13d ago

LLM Alignment Using Implicit User Feedback

A new dataset, IFLLM, collects mouse trajectories and eye gazing data from users interacting with LLMs. It shows that implicit feedback significantly improves LLM alignment, boosting text-based reward model accuracy from 55% to 64% and nearly tripling response quality improvements after DPO training on eight LLMs.

arxiv arXiv cs.CL · 13d ago

H-RePlan: Hierarchical Recovery for Cross-Device Agent Systems

H-RePlan introduces a hierarchical replanning framework that separates device-local strategy recovery from global orchestrator replanning. It outperforms existing baselines by achieving higher completion and instruction adherence, with reduced token cost, through scope-aware recovery in multi-device agent systems.

arxiv arXiv cs.CL · 13d ago

StylisticBias: Visual Cues Drive Most Social Biases in MLLMs

StylisticBias introduces a controlled benchmark to evaluate attribute-level social bias in multimodal large language models. It reveals that age and body type dominate identity-level effects, while fashion style and 15 key visual attributes drive most bias, accounting for nearly 80% of variation. The benchmark highlights that model judgments are most sensitive to appearance-related cues, especially in socioeconomic and style-based contexts.

arxiv arXiv cs.CL · 13d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders these states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.

media r/LocalLLaMA · 13d ago

Tesla P40 Feasibility Experiment with Improved Cooling Design

A user has demonstrated that Tesla P40 GPUs can be modified to an 8+6pin configuration and used with standard 1080 TI coolers. They designed a 2-1-2 airflow shroud that enables stable 120-130W sustained performance, prevents thermal shutdown, and reduces noise to approximately 42dB, significantly improving over existing cooling options.

github llama.cpp · 13d ago

llama.cpp release b9711: new binaries and updates

llama.cpp releases version b9711 with updated binaries for macOS, Linux, Android, Windows, and openEuler. The release includes support for ARM64, x64, Vulkan, ROCm, OpenVINO, SYCL, and HIP, with dedicated binaries for CPU and GPU acceleration. A new UI package is also available.

github llama.cpp · 13d ago

llama.cpp release b9712 fixes UI build with read-only source

llama.cpp version b9712 includes a fix for UI build issues caused by read-only source files. The release provides pre-built binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 13d ago

SETI @ Home as Distributed LLM Inference Engine?

SETI @ Home is a project that uses distributed computing for radio telescope data analysis. There is no known existing system that functions as a distributed LLM inference engine under this name. The proposal suggests such a system could be built, but it remains speculative and unimplemented.

arxiv arXiv cs.AI · 13d ago

AI Economist Agent: Model-Grounded Economic Analysis Framework

The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.

arxiv arXiv cs.AI · 13d ago

See-and-Reach: Vision-Language Navigation for UAVs in Field of View

UAV-VLN-FOV isolates the see-and-reach stage for precise evaluation of UAV navigation. 3DG-VLN enhances visual grounding and spatial alignment using dynamic 3D direction cues, achieving a 13.82% success rate improvement over baselines and validated in real-world trials.

arxiv arXiv cs.AI · 13d ago

Task Manager Reduces Queue Latency by 14-75% at Enterprise Scale

A Task Manager introduces priority inference, related-event merging, and preemption to enable continuous operation in enterprise AI. It reduces high-priority queue latency by 14-77% and improves related-event correctness by over 20 percentage points at enterprise scale, addressing agent discovery noise as the primary bottleneck.

arxiv arXiv cs.AI · 13d ago

Lean as Process-Verified Reward Oracle in RL for Theorem Proving

This work shows that Lean can serve as a symbolic process oracle, providing fine-grained, verified feedback during reinforcement learning. By parsing proof attempts into tactic sequences and using Lean's elaboration to mark sound steps and first failures, the system generates dense, type-theoretic reward signals. Experiments demonstrate tactic-level supervision outperforms outcome-only methods on benchmarks like MiniF2F and ProofNet, highlighting Lean's role as both evaluator and training reward source.

arxiv arXiv cs.AI · 13d ago

EEG Foundation Models for Burst-Suppression Detection in ICU

A study evaluates EEG Foundation Models for event-based burst-suppression detection in ICU settings without patient-specific calibration. REVE-base achieved the highest event-based F1-score of 0.868 and reduced burst-per-minute error by 52.1% compared to EEGNet and 36.2% compared to adaptive thresholding, demonstrating superior performance. Ablation results show full fine-tuning outperforms other strategies, and pretrained REVE-base surpasses random initialization by 0.723 F1 points at 25% labeled data, highlighting the value of pretraining for limited datasets.

arxiv arXiv cs.AI · 13d ago

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

A novel variable-length tokenizer uses learnable global merging to enable cross-length representation alignment in diffusion models. This data-independent approach overcomes position-dependent semantics and improves the quality-compute trade-off on ImageNet 256×25-6 generation compared to prior methods.

arxiv arXiv cs.AI · 13d ago

Hidden Evolution of Disguised Visual Context in VLMs

Visual tokens enter large language models as raw, unstructured signals. Their internal transformation and integration depend on architecture—either as in-context prompts or injected into intermediate layers—leading to distinct evolution paths in visual representation and frequency characteristics. We find that attention alone is insufficient; performance is driven by the quality of visual representations at each layer across different integration paradigms.