Microsoft Research — korshunov.ai

Lab · Microsoft Research

Spotlight enables DiT RL post-training by leveraging idle spot GPUs, reducing costs by 1.4-6.4× while achieving superior image quality. It uses stale model weights in exploration and reconfigures sequence parallelism in real time, allowing efficient GPU utilization without breaking training pipelines.

arxiv arXiv cs.LG · 8d ago

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars' triple-brace interpolation fails to protect against structural role injection, as HTML escaping only neutralizes angle-bracket delimiters. It leaves colon and Markdown hash delimiters intact, enabling attackers to hijack model behavior. The default escaping provides no protection for most role delimiter schemes and cannot replace a clear separation of instructions and data.

arxiv arXiv cs.CL · 8d ago

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars' triple-brace interpolation fails to protect against structural role injection, as HTML escaping only neutralizes angle-bracket delimiters. It leaves colon and Markdown hash delimiters intact, enabling attackers to hijack model turns. The default escaping provides no protection for most role delimiter families and cannot replace a structural separation of instructions and data.

arxiv arXiv cs.AI · 8d ago

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars' triple-brace interpolation fails to protect against structural role injection, as HTML escaping only neutralizes angle-bracket delimiters. It leaves colon and Markdown hash delimiters intact, enabling attackers to hijack model turns. The default escaping provides no protection for most delimiter families and cannot replace a structural separation of instruction and data.

arxiv arXiv cs.CL · 9d ago

KVEraser: Efficient Localized Context Erasing in LLMs

KVEraser enables efficient localized context erasing in large language models by replacing only the KV cache states of an erased span with learned steering states. It achieves near-full-recomputation performance on in-domain tasks across 1K to 32K context lengths, with only a 24% latency increase, and outperforms other approximate methods in long-document QA with 3--4x speedup over full recomputation.

arxiv arXiv cs.AI · 9d ago

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

BinTrack is a fully open-source spatial question answering agent that uses binary search over a robot's trajectory to locate answers. It achieves up to 22.8% higher accuracy than other open-source methods and matches closed-source model performance on the most challenging global category of the SpaceLocQA benchmark. The system also offers over 1.5x faster inference and introduces GangnamLoop, a real-world outdoor benchmark collected with a quadruped robot.

arxiv arXiv cs.LG · 9d ago

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.

arxiv arXiv cs.LG · 9d ago

Geometric Action Model for Robot Policy Learning

The Geometric Action Model (GAM) enables robot policies to reason about 3D physical interactions by repurposing a pretrained geometric foundation model. GAM splits the GFM to serve as both an observation encoder and a causal future predictor, then routes predicted future geometry and actions through the same backbone, achieving accurate, robust, and efficient manipulation performance in simulation and real-robot benchmarks.

arxiv arXiv cs.AI · 7d ago

FoMoE Breaks Full-Replica Barrier with Partitioned Expert Layers

FoMoE introduces a system that partitions expert layers across workers to avoid full model replicas, reducing communication costs by up to 1.42x over efficient baselines and 45.44x over DDP. It achieves up to 1.4x throughput speedups via a skip-token mechanism and demonstrates stable routing, with projected benefits extending to 100B-scale models through system modeling.

media r/LocalLLaMA · 7d ago

TRELLIS.2 now runs natively on MLX

TRELLIS.2 has been ported to run natively on MLX for Apple Silicon. The model supports 512x512 and 1024x1024 image inputs, with generation times of approximately 70 seconds for 512x517 and 300 to 700 seconds for 1024x1024 on an M4 Max with 128GB unified memory.

arxiv arXiv cs.LG · 8d ago

Edge Flow: A Continuous-Time Model for Gradient Descent at Edge of Stability

Edge Flow is a tractable, predictive continuous-time model that captures gradient descent dynamics at the edge of stability. It decomposes dynamics into center, oscillation direction, and magnitude, with self-stabilization of sharpness emerging from coupled feedback. The model requires only two gradient evaluations and one Hessian-vector product per iteration and outperforms prior models in tracking oscillations and explaining instabilities at EoS.

arxiv arXiv cs.LG · 8d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory degrades with each write, forming a non-renewable asset. A wear-aware pricing model uses a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on whether task value increases with memory persistence. The sign of the value-write association $χ$ varies by deployment: positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation. The endurance budget is binding only on low-end QLC/eMMC memory, and while wear-aware routing aligns with task value, actual performance improvements remain unverified in data.

arxiv arXiv cs.LG · 8d ago

ATT&CK-Labeled Multi-Source Cybersecurity Logs Dataset Released

A new dataset combines system, network, and browser logs from 870 Windows sessions, including 70 attacks and 800 benign cases. It provides per-event labels with MITRE ATT&CK technique IDs for 12 tactics and 53 techniques, using real attack tools like RAT and C2 tunnels. Fine-tuning three Small Language Models (SLMs) via LoRA improved chunk classification accuracy to 90–97% and achieved up to 42% exact-match accuracy in technique identification, showing strong reasoning capture despite challenges.

arxiv arXiv cs.LG · 8d ago

MGUP: Momentum-Gradient Alignment for Selective Optimization

MGUP introduces a selective update mechanism that applies larger step-sizes to a fixed proportion of parameters in stochastic optimization, while using smaller, non-zero step-sizes for the rest. It integrates seamlessly with optimizers like AdamW, Lion, and Muon, providing theoretical convergence guarantees for MGUP-AdamW and demonstrating superior or more stable performance in training large language models and MAE pretraining tasks.

arxiv arXiv cs.AI · 8d ago

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

EAGG introduces a grasp generator that aligns embodiment structure within a shared model using topology-aware graphs and geometry-aware tokens. It achieves 56.17% average grasp success on MultiGripperGrasp, matching specialized models within 1.10 percentage points and reducing median contact distance from 0.239 cm to 0.189 cm.

arxiv arXiv cs.AI · 8d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory endurance is a non-renewable asset that degrades with each write. A wear-aware pricing model introduces a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on the value-write association $χ$. Empirical measurements show $χ$ is positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation, and the endurance budget is binding only on low-end QLC/eMMC memory, where wear-aware control influences routing based on task value without improving performance.

arxiv arXiv cs.CL · 8d ago

SwiftTrans Improves LLM Code Translation Efficiency

SwiftTrans addresses runtime efficiency gaps in LLM-based code translation by introducing Multi-Perspective Exploration and Difference-Aware Selection. The framework extends CodeNet, F2SBench, and introduces SwiftBench to evaluate runtime performance, showing consistent improvements in both correctness and efficiency across benchmarks.

arxiv arXiv cs.AI · 9d ago

Unified Causal-Origin Taxonomy for Distributional Shifts in RL

This paper introduces a unified causal-origin taxonomy that categorizes distributional shifts in reinforcement learning into internal, agent-driven, and external, environment-driven sources. It unifies ID/OOD generalization and non-stationary settings by framing shifts as structured changes in the agent-environment interaction process, using a POMDP decomposition and a shifted-time boundary perspective.

media Latent Space · 9d ago

Satya Nadella on Loopcraft and Frontier Ecosystems

Microsoft CEO Satya Nadella introduces 'Loopcraft' as a new theory of the firm, emphasizing that the real opportunity in AI lies not in selecting the best model, but in building learning loops that compound human and token capital. He asserts that the priority must be creating frontier ecosystems where every organization can own and grow its institutional knowledge, enabling broad value flow across industries and countries.

Spotlight: Using Spot GPUs to Accelerate DiT RL Post-Training

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

Handlebars Triple-Brace Injection Exploits Structural Role Delimiters

KVEraser: Efficient Localized Context Erasing in LLMs

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Geometric Action Model for Robot Policy Learning

FoMoE Breaks Full-Replica Barrier with Partitioned Expert Layers

TRELLIS.2 now runs natively on MLX

Edge Flow: A Continuous-Time Model for Gradient Descent at Edge of Stability

Flash Endurance as Depreciating Capital in Robot Memory

ATT&CK-Labeled Multi-Source Cybersecurity Logs Dataset Released

MGUP: Momentum-Gradient Alignment for Selective Optimization

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

Flash Endurance as Depreciating Capital in Robot Memory

SwiftTrans Improves LLM Code Translation Efficiency

Unified Causal-Origin Taxonomy for Distributional Shifts in RL

Satya Nadella on Loopcraft and Frontier Ecosystems