AI agents
media r/LocalLLaMA · 1d ago

Tmax-27B Terminal Agent for Small GPUs with DPPO Training

Tmax-27B is a terminal agent based on Qwen3.6-27B, trained with DPPO (RL), achieving 43% on Terminal Bench 2.0 and 69% on TB Lite. To run on consumer GPUs, it is quantized using importance-matrix-calibrated GGUF models from 2 to 5 bits per weight, with a grafted MTP head enabling speculative decoding. IQ2_XS at 8.5 GiB achieves 70% pass rate in agentic coding tasks, outperforming plain quantization and demonstrating stable tool-call generation.

arxiv arXiv cs.CL · 2d ago

Group-Graph Policy Optimization for Long-Horizon Agentic RL

Group-Graph Policy Optimization (G2PO) introduces a graph-based approach to enhance long-horizon agentic reinforcement learning by transforming interaction trajectories into state-transition graphs. It enables group-aggregated state-value estimation and edge-centric advantage calculation, improving credit assignment and reducing variance, and achieves up to 22.2% success rate improvement over GRPO on WebShop, ALFWorld, and AppWorld benchmarks.

arxiv arXiv cs.CL · 2d ago

SelfCompact: Self-Driving Context Compaction for Language Models

SelfCompact enables language models to autonomously decide when and how to compact accumulated context during reasoning. By combining a model-invoked summarization tool with a lightweight rubric that guides compaction based on trajectory structure, it achieves effective adaptive compaction without fine-tuning. Results show it matches or exceeds fixed-interval methods on math and agentic search benchmarks, improving baselines by up to 18.1 points on math and 5-9 points on search, at 30-70% lower token cost.

arxiv arXiv cs.CL · 2d ago

OpenBioRQ: Benchmark for Agentic Biomedical Research Faithfulness

OpenBioRQ introduces a benchmark of 12,553 unsolved biomedical research questions across 12 domains, designed to test agentic models' faithfulness and abstention. It evaluates models in a tool-using setting without answer keys, using real follow-up evidence rather than parametric knowledge, and reveals significant agentic collapse on the hardest questions where tools are no longer used despite being critical.