Training methods — korshunov.ai

Training methods Page 1 / 13

Repurposing Speech Classifier for Diffusion-Based Generation

A pretrained speech classifier is repurposed as a backbone for guided diffusion-based speech generation. By attaching a lightweight subnetwork and training it under denoising score matching, the approach achieves high speech quality with reduced memory and computational cost, using a single model instead of two separately trained components.

arxiv arXiv cs.LG · 6d ago

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant introduces a 4-bit KV caching method tailored for context-heavy agent workloads. It achieves 3.47x reduction in P50 time-to-first-token in late rounds and 1.63x higher output throughput compared to FP8 KV caching, using FP8 queries, FP4 KV tensors, and native AMD CDNA4 scaled-MFMA support.

arxiv arXiv cs.LG · 6d ago

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

This paper introduces Marginal Advantage Accumulation (MAA), a post-processing architecture that addresses cross-batch inconsistency in memory-driven agent self-evolution. MAA formalizes alignment and comparability as structural conditions, uses differential signals and exponential moving average to accumulate signed evidence per operation, and ensures traceability via semantic identity merging. It outperforms batch-level baselines in 14 out of 16 settings and reduces token consumption by about 75%.

arxiv arXiv cs.LG · 6d ago

Entropy Estimation in Multi-Qutrit Systems with Neural Networks

A study compares variational quantum algorithms and classical CNNs for von Neumann entropy estimation in multi-qutrit systems. CNNs achieve accurate, stable predictions with only 12.5% of full state tomography measurements, reaching 90th-percentile errors of 0.13-0.16 nats for four- and five-qutrit systems, showing systematic improvement with system size and robustness to noise.

arxiv arXiv cs.LG · 6d ago

Execution-State Capsules for Low-Latency On-Device AI Serving

Execution-state capsules enable graph-bound checkpointing and restoration of complete execution state, including KV, recurrent, and convolution states, for low-latency, small-batch on-device AI serving. On RTX 5090 and Jetson AGX Thor, capsule restore achieves byte-exact and token-identical correctness, with sub-millisecond GPU operations and TTFT speedups up to 27x at 16k tokens, demonstrating significant latency reduction in interactive AI workflows.

arxiv arXiv cs.LG · 6d ago

Multi-Task Bayesian In-Context Learning Framework

A new multi-task in-context learning framework enables amortized hierarchical Bayesian inference by representing prior information as a prefix in datasets. The transformer model adapts predictions across prior families, matching oracle performance on diverse tasks while being significantly faster. It is validated on real-world spatiotemporal temperature prediction.

arxiv arXiv cs.LG · 6d ago

Calibration in MoE Models Under Distribution Shift

This paper examines how mixture-of-experts models maintain calibration under distribution shift. It finds that expert-level calibration ensures overall model calibration in hard-routed models but is insufficient for soft-routed models. The authors propose adversarial reweighting to penalize calibration errors in routed aggregates, improving the accuracy-calibration tradeoff across tasks and shifts.

arxiv arXiv cs.LG · 6d ago

Lie-Algebra Attention: Group Element Tokens in Neural Networks

Lie-Algebra Attention introduces attention tokens as matrix Lie group elements, using the closed-form algebra norm of relative poses as attention scores. This method achieves invariant, equivariant attention without representation-theoretic components, outperforming vector-token baselines on SE(2), SO(3), and Aff(2) with fewer parameters and no learned kernels.

arxiv arXiv cs.AI · 6d ago

Lean as Process-Verified Reward Oracle in RL for Theorem Proving

This work shows that Lean can serve as a symbolic process oracle, providing fine-grained, verified feedback during reinforcement learning. By parsing proof attempts into tactic sequences and using Lean's elaboration to mark sound steps and first failures, the system generates dense, type-theoretic reward signals. Experiments demonstrate tactic-level supervision outperforms outcome-only methods on benchmarks like MiniF2F and ProofNet, highlighting Lean's role as both evaluator and training reward source.

arxiv arXiv cs.AI · 6d ago

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

A novel variable-length tokenizer uses learnable global merging to enable cross-length representation alignment in diffusion models. This data-independent approach overcomes position-dependent semantics and improves the quality-compute trade-off on ImageNet 256×25-6 generation compared to prior methods.

arxiv arXiv cs.AI · 6d ago

Residual-Space Evolutionary Optimization via Flow-based Generative Models

A model-agnostic framework combines flow-based generative editing with evolutionary algorithms to enable data editing in non-differentiable settings. It operates in residual space, using self-pollination for local refinement and cross-pollination for broad exploration, validated on MorphoMNIST and crystal data to balance target alignment, instance preservation, and diversity.

arxiv arXiv cs.AI · 6d ago

Attention-Based SAC for Porosity Prediction in Additive Manufacturing

A multi-head attention feature extractor integrated with Soft Actor-Critic improves porosity prediction and process parameter optimization in laser powder bed fusion. The method achieves a convergence value of 322.79 in 14 episodes, outperforming DQN, PPO, TD3, and vanilla SAC with faster convergence and greater stability.

arxiv arXiv cs.AI · 6d ago

Hybrid Diffusion Transformer for Instruction-Guided Audio Editing

A hybrid two-stage diffusion transformer architecture enables efficient and accurate instruction-guided audio editing. It uses coarse-to-fine semantic alignment via joint attention at low resolution, followed by refined editing with alternating joint and cross-attention at high resolution. The method achieves better performance on complex editing tasks with improved efficiency and a compact model.

arxiv arXiv cs.AI · 6d ago

Sensorimotor World Models for Action-Aligned Perception

A new sensorimotor world model (SMWM) learns compact, action-relevant latent representations from offline trajectories. It uses inverse dynamics regularization to prevent representation collapse and align latent states with controllable environmental degrees of freedom, enabling stable training without complex regularizers or frozen components. SMWM achieves competitive planning performance in 2D and 3D control tasks.

arxiv arXiv cs.AI · 6d ago

Frequency-Aware Flow Matching for Robotic Action Generation

Frequency-Aware Flow Matching (FAFM) enables continuous and temporally consistent robotic action generation by transforming discrete action sequences into the frequency domain using discrete cosine transform. It regularizes first-order temporal derivatives with a Sobolev-type constraint to ensure smooth actions, improving success rates, motion smoothness, and robustness across synthetic and real-world tasks without adding network parameters.

arxiv arXiv cs.AI · 6d ago

RACL: Reasoning-Agent Control Layer for Metaheuristic Learning

RACL introduces a reasoning agent that controls metaheuristic search behavior without replacing optimizers or altering constraints. It improves or ties key policies in vehicle routing experiments, reducing average cost by 8.337% versus Fixed and 1.605% versus Stagnation-Triggered policies, with no significant computational overhead.

arxiv arXiv cs.AI · 6d ago

Hybrid ANN-SNN Pipeline with Local Plasticity

A hybrid ANN-SNN pipeline uses pretrained EfficientNet encoders and converts their activations to spike trains via rate-coding. The system trains a CoLaNET spiking classifier with local plasticity rules, achieving 99.09% accuracy on ImageNet's 64-class benchmark, matching conventional deep networks.

arxiv arXiv cs.AI · 6d ago

Modular-Sparsity Synchronization for PINN Training

ModSync addresses capacity-induced failure in PINNs by preventing functional modularity and self-partitioning of overparameterized networks. It enhances cross-objective interaction through structural optimization that penalizes task-exclusive connections while preserving interaction-promoting pathways.

arxiv arXiv cs.AI · 6d ago

Boundary Embedding Shaping for Graph Structural Disentanglement

Boundary Embedding Shaping (BES) addresses graph structural entanglement by selectively suppressing spurious neighbor correlations near class boundaries. BES uses adaptive contrastive learning to enhance boundary discrimination, improving GCN node classification by an average of 3.3% (up to 5.0% on WikiCS) and achieving superior link prediction accuracy.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.