Hugging Face — korshunov.ai

Lab · Hugging Face

UltraQuant introduces a 4-bit KV caching method tailored for context-heavy agent workloads. It achieves 3.47x reduction in P50 time-to-first-token in late rounds and 1.63x higher output throughput compared to FP8 KV caching, using FP8 queries, FP4 KV tensors, and native AMD CDNA4 scaled-MFMA support.

arxiv arXiv cs.CL · 6d ago

H-RePlan: Hierarchical Recovery for Cross-Device Agent Systems

H-RePlan introduces a hierarchical replanning framework that separates device-local strategy recovery from global orchestrator replanning. It outperforms existing baselines by achieving higher completion and instruction adherence, with reduced token cost, through scope-aware recovery in multi-device agent systems.

arxiv arXiv cs.AI · 6d ago

See-and-Reach: Vision-Language Navigation for UAVs in Field of View

UAV-VLN-FOV isolates the see-and-reach stage for precise evaluation of UAV navigation. 3DG-VLN enhances visual grounding and spatial alignment using dynamic 3D direction cues, achieving a 13.82% success rate improvement over baselines and validated in real-world trials.

arxiv arXiv cs.CL · 6d ago

JAMER: Project-Level Code Framework Dataset and Benchmark

JAMER introduces JamSet and JamBench, the first project-level game code dataset and benchmark on a professional game engine. Built from 8,133 verified Game Jam projects, it enables deterministic evaluation and reveals a capability cliff in AI models as project scale increases, with runtime pass rates dropping from 80.4% to 5.7%.

arxiv arXiv cs.LG · 7d ago

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

OneCanvas enables 3D scene understanding in Vision-Language Models by aggregating patch features onto a single panoramic canvas using 3D world coordinates. It achieves state-of-the-art performance on SQA3D and VSI-Bench, and generalizes to out-of-distribution data on SPBench, using significantly less training compute than existing methods.

arxiv arXiv cs.AI · 7d ago

User as Engram: Local Parametric Edits for Personal Memory

User as Engram proposes storing per-user facts as surgical, hash-keyed edits to a memory table, leaving reasoning in a shared adapter. This design achieves 5.6x higher indirect-reasoning accuracy and maintains base-level reasoning performance, with a memory footprint 33,000x smaller than per-user LoRA. The approach enables disjoint user edits that compose losslessly, outperforming retrieval pipelines beyond 100 facts.

arxiv arXiv cs.AI · 7d ago

Data Intelligence Agents Enable Autonomous Data Querying

Data Intelligence Agents (DIA) deploy autonomous coding agents to streamline enterprise data workflows. The Query Generator matches or exceeds top published results on seven SQL benchmarks across four dialects, showing generalization through natural-language instructions and execution-based architecture.

arxiv arXiv cs.LG · 8d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.AI · 9d ago

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

BinTrack is a fully open-source spatial question answering agent that uses binary search over a robot's trajectory to locate answers. It achieves up to 22.8% higher accuracy than other open-source methods and matches closed-source model performance on the most challenging global category of the SpaceLocQA benchmark. The system also offers over 1.5x faster inference and introduces GangnamLoop, a real-world outdoor benchmark collected with a quadruped robot.

arxiv arXiv cs.AI · 6d ago

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

A novel variable-length tokenizer uses learnable global merging to enable cross-length representation alignment in diffusion models. This data-independent approach overcomes position-dependent semantics and improves the quality-compute trade-off on ImageNet 256×25-6 generation compared to prior methods.

arxiv arXiv cs.AI · 6d ago

Hidden Evolution of Disguised Visual Context in VLMs

Visual tokens enter large language models as raw, unstructured signals. Their internal transformation and integration depend on architecture—either as in-context prompts or injected into intermediate layers—leading to distinct evolution paths in visual representation and frequency characteristics. We find that attention alone is insufficient; performance is driven by the quality of visual representations at each layer across different integration paradigms.

arxiv arXiv cs.AI · 6d ago

Hybrid ANN-SNN Pipeline with Local Plasticity

A hybrid ANN-SNN pipeline uses pretrained EfficientNet encoders and converts their activations to spike trains via rate-coding. The system trains a CoLaNET spiking classifier with local plasticity rules, achieving 99.09% accuracy on ImageNet's 64-class benchmark, matching conventional deep networks.

arxiv arXiv cs.LG · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to develop 'Connect the Dots' capability, allowing long-lifecycle agents to learn from experiences and iteratively update their environment context. The framework uses reinforcement learning with long rollout sequences and custom tasks to promote cross-domain generalization, showing effective out-of-distribution performance in both domains and transition settings.

arxiv arXiv cs.CL · 6d ago

Semantic Clusters Pre-Train Tsetlin Machine for Interpretability

A new framework pre-trains the Tsetlin Machine using semantic clusters from language models, avoiding embeddings. The method groups text samples into coherent clusters via K-means or Top2Vec, then uses cluster-sample pairs to train a non-negated TM with Type I feedback. Results show superior performance across five datasets, matching BERT-level accuracy while maintaining full interpretability.

arxiv arXiv cs.CL · 6d ago

Zero-Shot Agentic LLMs Extract Lung Pathology from Narratives

A zero-shot agentic workflow using open-source LLMs extracts 13 College of American Pathologists synoptic fields from lung resection pathology reports. The best model (GPT-OSS-20B) achieved a Micro-F1 of 0.893, outperforming baseline recall and accurately capturing complex pathologic relations without task-specific training.

arxiv arXiv cs.CL · 6d ago

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

A new framework enables large language models to learn 'Connect the Dots' by using reinforcement learning with long rollout sequences. The method includes tailored tasks and environments to foster meta-capability development, showing strong cross-domain generalization and performance in out-of-distribution settings. Implementations are available at https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod.

arxiv arXiv cs.CL · 6d ago

Tool-Intent Stabilization in Streaming RAG

A study measures tool-intent stabilization in Streaming RAG, defining when speculative tool queries converge to correct answers. On the CRAG benchmark, 73.9% of queries allow substantial latency hiding, with early stabilization observed in questions with verbatim retrievable evidence. Question type significantly predicts early versus late stabilization, informing when speculative triggers are effective.

media r/LocalLLaMA · 7d ago

Laguna M.1: 225B Parameter MoE Model for Agentic Coding

Laguna M.1 is a 225B-parameter mixture-of-experts model with 23B activated parameters per token, designed for agentic coding and long-horizon tasks. It achieves competitive performance on SWE-bench Verified (74.6%), SWE-bench Multilingual (63.1%), and Terminal-Bench 2.0 (45.8%), outperforming models like Devstral 2 and GLM-4.7 on key benchmarks.

media r/LocalLLaMA · 7d ago

Keye-VL-2.0-30B-A3B Launches with Advanced Video Understanding and Agent Capabilities

Keye-VL-2.0-30B-A3B is a 30B-parameter multimodal model designed for long-video understanding and agent functionality. It outperforms open-source rivals and matches Gemini-3-Flash in temporal grounding, supports up to 256K context with near-lossless reasoning, and includes built-in capabilities for code, tool, and web search agent workflows.

arxiv arXiv cs.LG · 7d ago

Act2Answer Evaluates Knowledge Retention in Vision-Language-Action Models

Act2Answer introduces a lightweight protocol to assess commonsense and world knowledge retention in VLA models by requiring agents to answer questions through object placement actions. A large-scale study of 7 VLA models and 9 VLM baselines reveals that VLAs perform well on simple concepts but show larger gaps on rich semantic categories compared to their source VLMs, with VQA co-training improving knowledge retention and peak answer-relevant signals observed in middle VLA layers.

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

H-RePlan: Hierarchical Recovery for Cross-Device Agent Systems

See-and-Reach: Vision-Language Navigation for UAVs in Field of View

JAMER: Project-Level Code Framework Dataset and Benchmark

OneCanvas: 3D Scene Understanding via Panoramic Reprojection

User as Engram: Local Parametric Edits for Personal Memory

Data Intelligence Agents Enable Autonomous Data Querying

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

Learnable Global Merging for Variable-Length Tokenization in Diffusion Transformers

Hidden Evolution of Disguised Visual Context in VLMs

Hybrid ANN-SNN Pipeline with Local Plasticity

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

Semantic Clusters Pre-Train Tsetlin Machine for Interpretability

Zero-Shot Agentic LLMs Extract Lung Pathology from Narratives

Training LLMs for Long-Lifecycle Agents via Cross-Domain Generalization

Tool-Intent Stabilization in Streaming RAG

Laguna M.1: 225B Parameter MoE Model for Agentic Coding

Keye-VL-2.0-30B-A3B Launches with Advanced Video Understanding and Agent Capabilities

Act2Answer Evaluates Knowledge Retention in Vision-Language-Action Models