All articles — korshunov.ai

All articles Page 1 / 122

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that addresses GPU memory inefficiencies by separating FFN weights and KV-cache into distinct pools. This disaggregation allows the system to consolidate static weights while dynamically provisioning active KV-cache demand, overcoming the limitations of monolithic memory allocation.

arxiv arXiv cs.LG · 5h ago

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

This study conducts a rigorous reevaluation of nine recent Graph Foundation Models (GFMs) for node property prediction to address the lack of unified evaluation standards in the field. The authors compare these models against strong Graph Neural Network (GNN) baselines to determine their relative performance and efficiency.

arxiv arXiv cs.LG · 5h ago

Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization

This paper reinterprets Large Language Models as high-dimensional Dense Associative Memories where correct reasoning corresponds to deep attractor basins in the energy landscape. The authors introduce a retrieval mechanism that samples multiple reasoning paths and weights them by inverse energy to approximate the equilibrium distribution.

arxiv arXiv cs.LG · 5h ago

EERLoss: A Novel Loss Function for Training Deep Biometric Models

This paper introduces EERLoss, a subdifferentiable approximation of the Equal Error Rate (EER) designed to align deep biometric model training with primary evaluation metrics. Validated on keystroke dynamics verification using the KVC-onGoing benchmark, the approach addresses the misalignment between optimization objectives and performance assessment.

arxiv arXiv cs.LG · 5h ago

QC-SMOTE: Quality-Controlled SMOTE for Imbalanced Classification

The authors propose QC-SMOTE, a quality-controlled oversampling framework designed to address the generation of low-quality synthetic samples in noisy or overlapping regions common in imbalanced classification tasks. This method estimates minority sample reliability using a composite neighborhood trustworthiness score and employs an IPQ-guided best-of-K strategy for generating synthetic candidates.

arxiv arXiv cs.LG · 6h ago

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

This paper introduces ASALT, a method that enables lateral transfer learning in multi-agent reinforcement learning by accommodating mismatched state-space dimensionalities between source and target domains. The approach uses observation-level and state-level adapters to map inputs into a shared embedding space, facilitating effective knowledge transfer across heterogeneous environments.

arxiv arXiv cs.AI · 6h ago

Cross-Level Ontological Grounding of ODRL Permissions, Prohibitions, and Duties

The article formulates the Cross-Level Design Principle to address how ODRL policy evaluators fail to specify normative positions, authority structures, or violation declaration power. It establishes that any normative language with violable norms requires both conduct-level positions like Permission and Duty, and competence-level positions such as Power and Immunity.

arxiv arXiv cs.AI · 6h ago

MVG-KAN: Multi-View Geo-Wind Guided KAN for PM2.5 Forecasting

Researchers propose MVG-KAN, a model for accurate short-term PM2.5 forecasting that addresses the limitations of existing methods in capturing complex pollutant dispersion driven by meteorological factors.

arxiv arXiv cs.AI · 6h ago

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism

Researchers introduce DigenRL, a disaggregated reinforcement learning framework designed to address the inefficiencies of colocated execution in diffusion-based generative large language models. The system supports flexible resource allocation and heterogeneous GPUs while utilizing novel parallelism techniques to reduce execution bubbles.

arxiv arXiv cs.AI · 6h ago

When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs

A study reveals that large language models systematically suppress 'Causal Caution'—the tendency to refrain from causal judgment without sufficient evidence—when shifting from academic to practical advisory contexts. This suppression occurs despite the models retaining the underlying capability, as evidenced by the ability to restore cautious reasoning through specific prompts.

arxiv arXiv cs.AI · 6h ago

Structural Kolmogorov-Arnold Convolutions: Learnable Function on the Values or the Filter Shape

The article introduces Structural Kolmogorov-Arnold Networks (KANs) that place learnable functions in the convolution structure rather than on individual kernel entries, organizing the design by whether the function acts on pixel values or filter shape. It presents three realizations: SV-KAN with a shared value function, AG-KAN using a content-adaptive Gaussian gate, and RF-KAN which builds filters from oriented ridge profiles in a Morlet wavelet basis.

arxiv arXiv cs.AI · 6h ago

On the Stability of Prompt Ranking in Large Language Model Evaluation

This paper systematically studies the stability of prompt rankings under common variability sources like random seeds and limited evaluation subsets across three open-weight LLMs and two benchmark tasks.

arxiv arXiv cs.AI · 6h ago

Cycle-Consistent Neural Explanation of Formal Verification Certificates

Researchers propose a cycle-consistent neural architecture that generates faithful natural language explanations for formal verification certificates, addressing the opacity of these machine-checkable proofs for non-specialists. The system achieves 90.0% cycle-verified soundness on test data from a financial compliance domain, significantly outperforming multi-LLM baselines in both accuracy and inference speed.

media r/LocalLLaMA · 6h ago

Ornith 35B works reasonably well with Qwen3.6 35B DFlash speculative model

A user reports achieving a 30-40% increase in token generation speed by pairing the Ornith-1.0-35B model as a draft model with Qwen3.6-35B-A3B-DFlash using llama-server.

arxiv arXiv cs.AI · 7h ago

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

Researchers have introduced PHANTOM, a large-scale, open-source dataset containing 47,524 pre-generated adversarial attacks designed to evaluate the safety and robustness of vision-language models (VLMs). This resource consolidates and extends prior benchmarks by covering 10 high-level categories and 55 subcategories of harmful intents, aiming to lower the computational barriers for adversarial research.

arxiv arXiv cs.AI · 7h ago

Female-RHINO: Real-Time Scanner-Integrated Framework for Automated Uterine MRI Analysis

This article introduces Female-RHINO, a real-time AI-assisted framework that integrates with MRI scanners to perform automated quantitative uterine analysis and structured reporting during image acquisition. The system combines deep learning models for segmentation and landmark detection to derive biomarkers from sagittal T2-weighted pelvic MRI without manual interaction.

arxiv arXiv cs.AI · 7h ago

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability

The authors introduce Age of LLM, a turn-based 1v1 benchmark where two large language models compete on a 13x7 grid to destroy an enemy base under conditions of fog of war and full diplomacy. This private engine mitigates data contamination by using fresh random map seeds and opponents for each match.

arxiv arXiv cs.AI · 7h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization

EERLoss: A Novel Loss Function for Training Deep Biometric Models

QC-SMOTE: Quality-Controlled SMOTE for Imbalanced Classification

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

Cross-Level Ontological Grounding of ODRL Permissions, Prohibitions, and Duties

MVG-KAN: Multi-View Geo-Wind Guided KAN for PM2.5 Forecasting

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism

When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs

Structural Kolmogorov-Arnold Convolutions: Learnable Function on the Values or the Filter Shape

On the Stability of Prompt Ranking in Large Language Model Evaluation

Cycle-Consistent Neural Explanation of Formal Verification Certificates

Ornith 35B works reasonably well with Qwen3.6 35B DFlash speculative model

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

Female-RHINO: Real-Time Scanner-Integrated Framework for Automated Uterine MRI Analysis

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability

ATRIA: Adaptive Traceable ECG Reporting with Iterative Agents

Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders

Entity Resolution via Batched Oracle Queries