Lab · Hugging Face
arxiv arXiv cs.AI · 7d ago

User as Engram: Local Parametric Edits for Personal Memory

User as Engram proposes storing per-user facts as surgical, hash-keyed edits to a memory table, leaving reasoning in a shared adapter. This design achieves 5.6x higher indirect-reasoning accuracy and maintains base-level reasoning performance, with a memory footprint 33,000x smaller than per-user LoRA. The approach enables disjoint user edits that compose losslessly, outperforming retrieval pipelines beyond 100 facts.

arxiv arXiv cs.LG · 8d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.AI · 9d ago

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

BinTrack is a fully open-source spatial question answering agent that uses binary search over a robot's trajectory to locate answers. It achieves up to 22.8% higher accuracy than other open-source methods and matches closed-source model performance on the most challenging global category of the SpaceLocQA benchmark. The system also offers over 1.5x faster inference and introduces GangnamLoop, a real-world outdoor benchmark collected with a quadruped robot.

arxiv arXiv cs.AI · 6d ago

Hidden Evolution of Disguised Visual Context in VLMs

Visual tokens enter large language models as raw, unstructured signals. Their internal transformation and integration depend on architecture—either as in-context prompts or injected into intermediate layers—leading to distinct evolution paths in visual representation and frequency characteristics. We find that attention alone is insufficient; performance is driven by the quality of visual representations at each layer across different integration paradigms.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.CL · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to learn 'Connect the Dots' by using reinforcement learning with long rollout sequences. The method includes tailored tasks and environments to foster meta-capability development, showing strong cross-domain generalization and performance in out-of-distribution settings. Implementations are available at https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod.

arxiv arXiv cs.LG · 7d ago

Act2Answer Evaluates Knowledge Retention in Vision-Language-Action Models

Act2Answer introduces a lightweight protocol to assess commonsense and world knowledge retention in VLA models by requiring agents to answer questions through object placement actions. A large-scale study of 7 VLA models and 9 VLM baselines reveals that VLAs perform well on simple concepts but show larger gaps on rich semantic categories compared to their source VLMs, with VQA co-training improving knowledge retention and peak answer-relevant signals observed in middle VLA layers.