Lab · Microsoft Research
arxiv arXiv cs.AI · 9d ago

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

BinTrack is a fully open-source spatial question answering agent that uses binary search over a robot's trajectory to locate answers. It achieves up to 22.8% higher accuracy than other open-source methods and matches closed-source model performance on the most challenging global category of the SpaceLocQA benchmark. The system also offers over 1.5x faster inference and introduces GangnamLoop, a real-world outdoor benchmark collected with a quadruped robot.

arxiv arXiv cs.LG · 9d ago

ROVE: Reinforcement Learning with Human Interventions for Humanoid Manipulation

ROVE enables humanoid Vision-Language-Action models to learn effective manipulation behaviors using imperfect human interventions. It combines a human-in-the-loop data collection pipeline with Optimistic Value Estimation and cross-embodiment supervision to prioritize high-value actions and improve robustness. ROVE outperforms baseline methods on real-world, contact-rich manipulation tasks through iterative rollout and intervention cycles.

arxiv arXiv cs.LG · 9d ago

HABC Improves RL Fine-Tuning of VLAs with Sparse Outcomes

Hierarchical Advantage-Weighted Behavior Cloning (HABC) enhances online RL fine-tuning of vision-language agents by using separate critic heads for viability and efficiency. It combines their outputs via a state-adaptive gate and applies per-transition weights, while intervention-aware credit assignment prevents supervision leakage. In real-robot experiments, HABC boosts success rates to 92%, 88%, and 38% on three bimanual tasks, surpassing SFT baselines of 36%, 44%, and 12%.

arxiv arXiv cs.LG · 9d ago

Geometric Action Model for Robot Policy Learning

The Geometric Action Model (GAM) enables robot policies to reason about 3D physical interactions by repurposing a pretrained geometric foundation model. GAM splits the GFM to serve as both an observation encoder and a causal future predictor, then routes predicted future geometry and actions through the same backbone, achieving accurate, robust, and efficient manipulation performance in simulation and real-robot benchmarks.

arxiv arXiv cs.LG · 8d ago

Edge Flow: A Continuous-Time Model for Gradient Descent at Edge of Stability

Edge Flow is a tractable, predictive continuous-time model that captures gradient descent dynamics at the edge of stability. It decomposes dynamics into center, oscillation direction, and magnitude, with self-stabilization of sharpness emerging from coupled feedback. The model requires only two gradient evaluations and one Hessian-vector product per iteration and outperforms prior models in tracking oscillations and explaining instabilities at EoS.

arxiv arXiv cs.LG · 8d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory degrades with each write, forming a non-renewable asset. A wear-aware pricing model uses a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on whether task value increases with memory persistence. The sign of the value-write association $χ$ varies by deployment: positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation. The endurance budget is binding only on low-end QLC/eMMC memory, and while wear-aware routing aligns with task value, actual performance improvements remain unverified in data.

arxiv arXiv cs.LG · 8d ago

ATT&CK-Labeled Multi-Source Cybersecurity Logs Dataset Released

A new dataset combines system, network, and browser logs from 870 Windows sessions, including 70 attacks and 800 benign cases. It provides per-event labels with MITRE ATT&CK technique IDs for 12 tactics and 53 techniques, using real attack tools like RAT and C2 tunnels. Fine-tuning three Small Language Models (SLMs) via LoRA improved chunk classification accuracy to 90–97% and achieved up to 42% exact-match accuracy in technique identification, showing strong reasoning capture despite challenges.

arxiv arXiv cs.LG · 8d ago

MGUP: Momentum-Gradient Alignment for Selective Optimization

MGUP introduces a selective update mechanism that applies larger step-sizes to a fixed proportion of parameters in stochastic optimization, while using smaller, non-zero step-sizes for the rest. It integrates seamlessly with optimizers like AdamW, Lion, and Muon, providing theoretical convergence guarantees for MGUP-AdamW and demonstrating superior or more stable performance in training large language models and MAE pretraining tasks.

arxiv arXiv cs.AI · 8d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory endurance is a non-renewable asset that degrades with each write. A wear-aware pricing model introduces a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on the value-write association $χ$. Empirical measurements show $χ$ is positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation, and the endurance budget is binding only on low-end QLC/eMMC memory, where wear-aware control influences routing based on task value without improving performance.

media Latent Space · 9d ago

Satya Nadella on Loopcraft and Frontier Ecosystems

Microsoft CEO Satya Nadella introduces 'Loopcraft' as a new theory of the firm, emphasizing that the real opportunity in AI lies not in selecting the best model, but in building learning loops that compound human and token capital. He asserts that the priority must be creating frontier ecosystems where every organization can own and grow its institutional knowledge, enabling broad value flow across industries and countries.