All articles — korshunov.ai

All articles Page 1 / 123

Open-Vocabulary BEV Segmentation with 3D-Aware Geometric Constraints

The authors introduce OVBEVSeg, a framework for open-vocabulary bird's-eye view (BEV) segmentation that utilizes vision-language models to recognize categories beyond the training set while maintaining real-time efficiency. To address the 3D geometric inconsistency inherent in lifting 2D semantics into BEV, the method employs robust 3D geometric constraints across three progressive stages.

arxiv arXiv cs.LG · 5h ago

PHANTOM: A Large-Scale Dataset of Multimodal Adversarial Attacks for Vision-Language Models

The authors introduce PHANTOM, a large-scale open-source dataset containing 47,524 pre-generated adversarial attacks designed to evaluate the safety and robustness of vision-language models (VLMs). This resource consolidates existing benchmarks and extends them with new categories to provide diverse and practical evaluation data for the research community.

arxiv arXiv cs.LG · 5h ago

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

The authors propose H-Res (Hierarchical Residual Steering), a mechanism that adapts large Transformer models by modulating their effective energy landscape without altering global equilibrium or expanding sequence length. This approach formulates adaptation as a control problem on the activation manifold to steer token trajectories into task-specific basins of attraction.

arxiv arXiv cs.LG · 5h ago

RE4: Transformation-aware Imitation of Object Interactions Using Manipulation Modes

This paper introduces RE4, a framework for imitation learning that combines principled manipulation theories with modern benchmarks to preserve both performance and interpretability in object interaction tasks. The approach utilizes lightweight, self-supervised pose estimation and mode-aware transformations to retrieve and replan demonstrations effectively.

media r/LocalLLaMA · 5h ago

Introducing LongCat-2.0, a large-scale MoE language model

LongCat-2.0 is introduced as a large-scale Mixture of Experts (MoE) language model featuring 1.6 trillion total parameters with approximately 48 billion activated per token.

arxiv arXiv cs.LG · 6h ago

Natural Identifiers for Privacy and Data Audits in Large Language Models

This work introduces natural identifiers (NIDs), which are structured random strings like cryptographic hashes and shortened URLs found in LLM training data, to address the challenges of auditing large language model privacy. NIDs enable scalable, post-hoc differential privacy auditing without costly retraining and facilitate dataset inference without requiring private held-out datasets.

arxiv arXiv cs.LG · 6h ago

Data Augmentation: A Fourier Analysis Perspective

This article investigates whether partial data augmentation can achieve the same statistical benefits as full augmentation by developing a framework using Fourier analysis and representation theory of finite groups.

arxiv arXiv cs.LG · 6h ago

MedPCFM: Improving Medical Point Cloud Completion by Integrating Point Transformers and Flow Matching

This article introduces PCFM, a flow matching approach for medical point cloud completion that integrates Point Transformer v3 (PTv3) with continuous-time generative modeling. The method is evaluated on the SkullFix, SkullBreak, and Mandibular Defect datasets to assess its performance in anatomical reconstruction tasks.

arxiv arXiv cs.LG · 6h ago

Agnostic Machine Learning Model of Photosynthetic Habitability

Researchers have developed an agnostic model for the Photosynthetic Habitable Zone (PHZ) based on thermodynamics and redox chemistry, eliminating Earth-centric biases found in previous estimates. By optimizing a generic photochemical reaction against exoplanet irradiance spectra using a genetic algorithm, the study predicts that photosynthetic viability declines linearly with orbital distance rather than quadratically.

arxiv arXiv cs.LG · 6h ago

LLM-based Two-Stage Transformer for Bearing Fault Diagnosis

This paper proposes a knowledge-guided two-stage transfer learning framework to address bearing fault diagnosis challenges involving dataset heterogeneity, operating condition variations, and limited labeled data. The approach utilizes a lightweight GPT-2-style Transformer with causal self-attention for hierarchical feature extraction from vibration signals.

arxiv arXiv cs.LG · 6h ago

CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation

CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that addresses GPU memory inefficiencies by separating FFN weights and KV-cache into distinct pools. This disaggregation allows the system to consolidate static weights while dynamically provisioning active KV-cache demand, overcoming the limitations of monolithic memory allocation.

arxiv arXiv cs.LG · 6h ago

A Fair Evaluation of Graph Foundation Models for Node Property Prediction

This study conducts a rigorous reevaluation of nine recent Graph Foundation Models (GFMs) for node property prediction to address the lack of unified evaluation standards in the field. The authors compare these models against strong Graph Neural Network (GNN) baselines to determine their relative performance and efficiency.

arxiv arXiv cs.LG · 6h ago

Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization

This paper reinterprets Large Language Models as high-dimensional Dense Associative Memories where correct reasoning corresponds to deep attractor basins in the energy landscape. The authors introduce a retrieval mechanism that samples multiple reasoning paths and weights them by inverse energy to approximate the equilibrium distribution.

arxiv arXiv cs.LG · 6h ago

EERLoss: A Novel Loss Function for Training Deep Biometric Models

This paper introduces EERLoss, a subdifferentiable approximation of the Equal Error Rate (EER) designed to align deep biometric model training with primary evaluation metrics. Validated on keystroke dynamics verification using the KVC-onGoing benchmark, the approach addresses the misalignment between optimization objectives and performance assessment.

arxiv arXiv cs.LG · 6h ago

QC-SMOTE: Quality-Controlled SMOTE for Imbalanced Classification

The authors propose QC-SMOTE, a quality-controlled oversampling framework designed to address the generation of low-quality synthetic samples in noisy or overlapping regions common in imbalanced classification tasks. This method estimates minority sample reliability using a composite neighborhood trustworthiness score and employs an IPQ-guided best-of-K strategy for generating synthetic candidates.

arxiv arXiv cs.LG · 7h ago

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

This paper introduces ASALT, a method that enables lateral transfer learning in multi-agent reinforcement learning by accommodating mismatched state-space dimensionalities between source and target domains. The approach uses observation-level and state-level adapters to map inputs into a shared embedding space, facilitating effective knowledge transfer across heterogeneous environments.

arxiv arXiv cs.AI · 7h ago

Cross-Level Ontological Grounding of ODRL Permissions, Prohibitions, and Duties

The article formulates the Cross-Level Design Principle to address how ODRL policy evaluators fail to specify normative positions, authority structures, or violation declaration power. It establishes that any normative language with violable norms requires both conduct-level positions like Permission and Duty, and competence-level positions such as Power and Immunity.

arxiv arXiv cs.AI · 7h ago

MVG-KAN: Multi-View Geo-Wind Guided KAN for PM2.5 Forecasting

Researchers propose MVG-KAN, a model for accurate short-term PM2.5 forecasting that addresses the limitations of existing methods in capturing complex pollutant dispersion driven by meteorological factors.

arxiv arXiv cs.AI · 7h ago

Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism

Researchers introduce DigenRL, a disaggregated reinforcement learning framework designed to address the inefficiencies of colocated execution in diffusion-based generative large language models. The system supports flexible resource allocation and heterogeneous GPUs while utilizing novel parallelism techniques to reduce execution bubbles.

arxiv arXiv cs.AI · 7h ago

When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs

A study reveals that large language models systematically suppress 'Causal Caution'—the tendency to refrain from causal judgment without sufficient evidence—when shifting from academic to practical advisory contexts. This suppression occurs despite the models retaining the underlying capability, as evidenced by the ability to restore cautious reasoning through specific prompts.