All articles — korshunov.ai

All articles Page 1 / 96

CADRE: Stable, Parameter Efficient Adaptation of Medical Vision Language Models with Bounded Forgetting and Prior Drift

The authors introduce CADRE, a parameter-efficient framework for adapting medical vision-language models while preventing catastrophic forgetting and prior drift. The method combines low-rank adaptation with an online, self-scaling elastic weight consolidation term to bound retained-competence loss. It also employs an anchor-to-prior penalty to restrict embedding drift from the frozen pretrained model. Two short guarantees regarding consolidation mass and scale invariance address the order fragility found in vanilla EWC. The approach was evaluated on breast cancer data across histopathology, ultrasound, and chest radiography modalities. Training approximately 0.23% of parameters, CADRE achieved the lowest forgetting rate among adapting methods. This represented a sevenfold reduction compared to the strongest regularized baseline, dropping from 0.075 to 0.011. The model also demonstrated positive backward transfer where all baselines showed negative results.

arxiv arXiv cs.AI · 3h ago

DVL-DeepONet: Physics-Guided Operator Learning for Resilient Underwater Navigation

Researchers propose DVL-DeepONet, a physics-guided deep neural operator framework designed to enhance autonomous underwater vehicle navigation under degraded sensing conditions. The system addresses challenges arising from noisy or incomplete Doppler velocity log measurements and the absence of inertial sensors in low-cost platforms. It estimates velocity vectors through three operational scenarios: noise-resilient estimation with coupled sensors, DVL-only learning, and beam measurement recovery. By mapping temporal observations to vehicle velocity while enforcing physical consistency constraints, the model maintains robustness during environmental disturbances. The framework was validated using real-world AUV experiments covering a cumulative path length of approximately 10,000 meters. Experimental results demonstrate that DVL-DeepONet architectures outperform baseline model-based and learning-based algorithms by 40%.

media r/LocalLLaMA · 3h ago

Developer Brings Claude-Style Artifacts to Local Models via TurboLLM

A Reddit user highlights the absence of rendered artifacts in local AI setups compared to Anthropic's Claude. While local models can generate code for dashboards or diagrams, users typically must copy the output elsewhere to view it. To address this gap, the developer experimented with rendering generated HTML, SVG, and Mermaid code directly within the chat interface. The results demonstrated that the limitation lies in the user interface rather than the model's capabilities. A screenshot from the post shows a dashboard rendered by Gemma 4 26B from a single prompt on a desktop. The implementation was built using TurboLLM, which allows for this direct visualization of code outputs. The author invites the community to discuss their workflows and whether they miss Claude's artifact feature.

media r/LocalLLaMA · 3h ago

Reddit User Seeks Private Local LLM for Technical Documentation

A Reddit user is seeking recommendations for a local large language model capable of generating high-level and low-level software designs. The workflow involves using existing templates, cross-referencing code, and integrating with agentic frameworks like OpenCode via MCP to fetch data from Confluence and Jira. The user currently relies on Opus 3.6 through Kiro-cli but requires a solution that ensures data privacy. Key technical constraints include the necessity for at least 256k context length and strong reasoning capabilities. The poster questions whether hardware such as four RTX 3090 GPUs is necessary to achieve this level of performance locally.

arxiv arXiv cs.AI · 3h ago

POTracker Optimizes LLMs for Standard-Compliant Power Outage Report Generation

Recent large language models struggle with domain-specific data generation due to strict formatting and structural requirements. To address the interoperability of utility power outage reports in the United States, researchers propose POTracker, an optimized model for generating machine-readable compliance documents. The team fine-tuned Qwen2.5-7B-Instruct using a novel objective called POTrackerLoss. This new loss function accounts for both textual similarity and structural tag similarity between generated outputs and ground-truth reports. Evaluation on a dataset of 1,000 reports demonstrates that POTracker outperforms five fine-tuning methods and one rule-based XML conversion approach. The model improves overall accuracy by up to 51% and achieves 86.47% structural accuracy for the generated reports. Additionally, a human study involving domain experts assigned an average quality score of 4.03 on a 0-5 scale to the generated labels.

arxiv arXiv cs.AI · 3h ago

SQLConductor: Search-to-Policy Learning for Step-wise Text-to-SQL Orchestration

The authors propose SQLConductor, a step-wise orchestration learning framework for Text-to-SQL that addresses the limitations of fixed pipelines and static plan-then-execute methods. This system formulates subtasks as specialized actions and trains a policy model to select the next action based on intermediate artifacts and feedback. To learn this policy, the framework introduces Search-to-Policy Learning, which utilizes Monte Carlo Tree Search to explore candidate workflows and stability estimation to identify robust supervision. The policy model is trained using Stability-weighted Supervised Fine-tuning to prioritize high-quality orchestration patterns and further enhanced through Curriculum Reinforcement Learning. This approach transforms offline workflow search into a deployable policy for step-wise orchestration at inference time. Experiments on BIRD-Dev and out-of-distribution datasets show that SQLConductor achieves 73.2% execution accuracy, outperforming prior methods with comparable or larger backbones. The results demonstrate superior execution accuracy and strong generalization while coordinating frozen larger action models.

arxiv arXiv cs.AI · 3h ago

VeriEvol: Scaling Multimodal Mathematical Reasoning via Verifiable Evol-Instruct

The authors introduce VeriEvol, an iterative framework designed to scale multimodal mathematical reasoning by decoupling prompt difficulty from answer reliability. This approach addresses the challenge of maintaining reliable reward labels as data volume increases in reinforcement learning pipelines. The system utilizes a type-aware evolution module to rewrite low-difficulty seeds into harder, image-grounded prompts through route-specific operators. Answer verification is handled by HTV-Agent, which accepts responses only after multi-source counter-evidence fails to refute them. Scaling evolved supervised fine-tuning data from 10K to 250K samples increased mean accuracy on five benchmarks from 35.42 to 54.73. When integrated with a fixed GRPO recipe, VeriEvol provided a cumulative gain of +3.88 over an un-evolved baseline. This improvement is attributed to +1.82 from evolved prompts and +2.06 from the HTV-Agent verifier. The authors release all prompts, data, models, code, and full verifier traces to enable downstream auditing and scaling.

arxiv arXiv cs.AI · 3h ago

Energy Consumption of Transformer Fine-Tuning: A Roofline-Inspired Scaling Model

The authors present a framework for modeling the energy consumption of Transformer training across multiple GPUs, addressing the need for sustainable system design as computational costs rise. By conducting controlled architectural sweeps on BERT models, they relate measured energy usage to lightweight proxies for compute, memory traffic, and hardware efficiency. The approach is inspired by roofline models and incorporates a speedup-based hardware-efficiency factor to account for tensor parallelism and fully sharded data parallelism. This methodology allows for the derivation of a scaling law model that accurately predicts training energy across heterogeneous configurations. The work highlights the critical importance of predicting energy consumption as model size and parallelism scale. It provides a practical tool for cost-aware design in large-scale natural language processing systems.

media r/LocalLLaMA · 3h ago

Reddit User Questions RTX 6000 Pro Value Amidst Price Surge

A Reddit user in the r/LocalLLaMA community is seeking advice on purchasing an NVIDIA RTX 6000 Pro GPU. The poster notes that the price has risen significantly from approximately $8,000 six months ago to around $13,000 currently. They are looking for feedback from existing owners regarding their satisfaction with the hardware. Specifically, the user asks if the card is worth the investment for running models like Qwen 2.5 7B. The post aims to help the buyer justify the expense to their spouse by gathering real-world usage experiences.

lab Hugging Face Blog · 3h ago

Analysis of Token Prediction Accuracy in Hybrid Language Models

A recent study investigates which specific tokens are predicted more accurately by hybrid language models compared to standard dense architectures. The research focuses on understanding the distribution of prediction errors across different token types, such as rare words and code snippets. By analyzing the loss landscapes, the authors identify that hybrid models excel at capturing long-range dependencies in sparse data regions. The findings suggest that the mixture of experts mechanism allows for more efficient parameter utilization during inference. This improved accuracy is particularly notable for tokens with low frequency in the training corpus. The paper provides a detailed breakdown of performance metrics across various benchmark datasets. These results highlight the potential of hybrid architectures for handling diverse linguistic structures effectively.

arxiv arXiv cs.AI · 4h ago

Self-Aware Scheduling Learns Token Unmasking Order in Diffusion Language Models

The authors propose Self-Aware Scheduling (SAS) to optimize the token unmasking order in masked diffusion language models, which significantly impacts generation quality. They derive a tractable upper bound on sequential decoding mismatch using Kullback-Leibler divergence and pathwise log-likelihood. This bound creates a dense self-aware reward that frames order selection as a policy optimization problem with a frozen denoiser. SAS learns a lightweight order policy via Group Relative Policy Optimization, supporting both any-order and semi-autoregressive decoding. On Sudoku tasks using a 1B parameter model, accuracy improved from 82.0% to 91.8%, reaching 97.5% after second-stage fine-tuning. For mathematical reasoning with LLaDA-8B, pass@1 on GSM8K increased from 64% to 76%. The method also raised MBPP scores from 39.5% to 41%, consistently matching or exceeding heuristic schedules across various parameters.

arxiv arXiv cs.AI · 4h ago

KORE: Kolmogorov-Optimal Scaling Laws for Spline Regression

Researchers propose KORE, a method that solves for the optimal spline resolution in closed form rather than relying on hyperparameter search. The approach leverages classical approximation theory to pin squared bias to the Kolmogorov n-width and uses the PRESS identity for leave-one-out error estimation. By balancing these known curves, the algorithm analytically determines the minimizer without exhaustive grid sweeps. KORE extends this calculus to high dimensions by replacing ambient input dimension with interaction order in an ANOVA decomposition. The algorithm fits two pilot resolutions and solves a leverage-calibrated system to evaluate the plug-in resolution with minimal compute. Across additive and sparse pairwise targets up to 80 dimensions, KORE matches exhaustive cross-validation accuracy while fitting roughly eight times fewer models. On 36 real tabular datasets, it ranked first among 21 methods in accuracy per unit of compute.

arxiv arXiv cs.AI · 4h ago

Kamera: Training-Free Position-Invariant Multimodal KV Cache for Efficient Reuse

The authors introduce Kamera, a method that enables training-free reuse of multimodal key-value caches by addressing the loss of cross-chunk conditioning in naive prefix caching. Standard state-merge recovers direct readouts but fails to preserve the diffuse, low-rank residue in deep layers essential for multi-hop reasoning, which halves accuracy. To repair this, Kamera stores a small, training-free low-rank conditioning patch alongside each position-free chunk. This approach allows exact RoPE re-rotation and cross-chunk binding restoration across MLA, GQA, and MHA attention mechanisms. The system supports cheap reorder, sliding-window survival, and recall operations without requiring re-encoding of evicted chunks. Experiments show that a rank-m patch recovers full task accuracy on cross-chunk-binding benchmarks like MM-NIAH and two-page doc-QA. The solution reconstructs re-prefill KV to within bf16 rounding in a production SGLang kernel across six backbones while maintaining a fraction of the original KV footprint.

arxiv arXiv cs.AI · 4h ago

Decentralized Autonomous Traffic Management through Corridor Networks

This study addresses the insufficiency of centralized management for high-density autonomous aircraft traffic by proposing a decentralized approach using multi-agent reinforcement learning. The researchers extend this MARL framework to manage traffic flow within complex air corridor networks featuring merges and splits. Policies trained in single-corridor settings are tested on increasingly complex multi-corridor scenarios in a zero-shot manner without retraining. Experimental results show that learned behaviors transfer effectively across varying traffic densities, network geometries, and heterogeneous vehicle performances. The evaluation measures system-level performance through conformance to boundaries, completion rates, average speeds, distance traveled, and inter-aircraft separation. Despite requiring only locally coordinated entry, traversal, and exit behaviors, the collective actions produce desirable traffic flows throughout the corridor network.

arxiv arXiv cs.AI · 4h ago

Enactor: A Generative Model for Closed-Loop Microsimulation of Signalized Intersections

The authors introduce Enactor, an actor-centric generative model designed for closed-loop microsimulation at signalized intersections. Unlike traditional simulators that rely on hand-crafted rules or short-horizon predictors, Enactor focuses on vehicle dynamics while treating pedestrians as contextual influences. The architecture encodes dynamic actors and lane polylines in polar coordinates relative to the intersection center. A transformer with separate spatial and temporal attention blocks predicts a distribution over each actor's next-step motion parameters. Training employs a closed-loop curriculum, exposing the model to its own predictions to ensure stability during simulation. Evaluations on two intersection geometries show Enactor recovers SUMO data generator distributions with significantly lower KL divergence than transformer baselines. The model also reduces red-light violations by more than an order of magnitude and outperforms constant-velocity baselines on real-world field data.

arxiv arXiv cs.AI · 4h ago

Persistent Homology Detects and Steers LLM Responses to Ill-Posed Questions

Researchers propose using finite zero-dimensional persistent homology to represent the topology of ill-posed questions within large language models. The method models contextual hidden states as point clouds, summarizing each transformer layer with three descriptors: mean finite lifetime, normalized lifetime entropy, and largest-lifetime concentration. These descriptors are concatenated across layers to form a unified topological representation of the query's internal state. The study introduces topology-conditioned activation steering, which retrieves similar examples to construct interventions that encourage clarification or abstention. Evaluations on AmbigQA, SituatedQA, and CLAMBER show this approach outperforms prompt-based baselines, improving classification accuracy from 67.4% to 78.9% on AmbigQA. On SituatedQA, accuracy increased from 79.9% to 88.5%, while CLAMBER saw gains from 57.6% to 69.6%. Additionally, the steering mechanism raised the average total acceptable response rate from 61.4% to 70.6% across three open-weight LLMs.

arxiv arXiv cs.AI · 4h ago

SPIRAL: Learning to Search and Aggregate

The authors introduce Sequential-Parallel-Aggregative Reinforcement Learning (SPIRAL), a framework that trains language models to utilize sequential, parallel, and aggregative reasoning primitives simultaneously. Unlike standard post-training methods that optimize only for single-trace sequential reasoning, SPIRAL unifies these components into a single inference compute pipeline. The model first samples independent traces in parallel using chain-of-thought reasoning and then generates a final aggregation trace conditioned on those inputs. This entire process is optimized end-to-end against the reward of the final aggregated response using set reinforcement learning and standard reinforcement learning techniques. Experiments on reasoning tasks demonstrate that SPIRAL effectively scales with inference compute resources. The approach outperforms GRPO by up to 11 times in scaling efficiency and achieves 15% higher performance when all three compute primitives are scaled.

arxiv arXiv cs.AI · 4h ago

Against Proxy Optimization

The author discusses the conditions under which maximizing a proxy utility function can lead to harmful outcomes. This analysis suggests that such scenarios pose significant problems for the application of standard decision theory. The text highlights specific circumstances where optimizing for a surrogate goal diverges from intended results. These findings challenge the robustness of current theoretical frameworks used in artificial intelligence and economics. By identifying these failure modes, the work aims to refine how agents should be designed to avoid unintended consequences.

arxiv arXiv cs.AI · 4h ago

Polycepta: Object-Centric Appearance Estimation for Multi-Object Tracking

The authors introduce Polycepta, an object-centric appearance state estimation framework that reformulates appearance modeling as a recursive estimation problem. Unlike traditional methods relying on static, frame-independent descriptors, Polycepta constructs and continuously updates independent appearance states for each tracked object. This approach allows future representations to be estimated from accumulated observations rather than memorizing them through a specific learning strategy. A key feature is that appearance estimation quality improves progressively as object states evolve during inference. The framework enables appearance estimation for unseen classes by encouraging the learning of object-specific representation construction. Extensive experiments on KITTI, Waymo Open Dataset, and MOT17 demonstrate consistent reductions in identity switches and improved tracking performance. When integrated into the RobMOT framework, Polycepta operates at 90.57 Hz and achieves a MOTA of 92.27% on the KITTI benchmark.

arxiv arXiv cs.AI · 4h ago

Dual-Learned Matching Enables Linear Mode Connectivity for Billion-Parameter Transformers

Researchers propose a scalable framework to enable linear mode connectivity-based merging for billion-parameter pretrained transformers. Existing methods typically optimize interpolation paths from only one model endpoint, limiting scalability for large architectures. The new approach applies parameterized weight transformations to align functionally equivalent solutions and uses a dual learning procedure where both models jointly learn transformations toward a shared path. This bidirectional optimization substantially reduces interpolation barriers and improves merging reliability across large-scale models. Empirically, the method achieves near-zero loss barriers on WikiText for medium-sized language models. In vision tasks, ViT-L maintains above 69% ImageNet top-1 accuracy throughout the interpolation path. Modern billion-parameter LLMs exhibit only small loss barriers using this technique.