Hugging Face — korshunov.ai

Lab · Hugging Face

User as Engram proposes storing per-user facts as surgical, hash-keyed edits to a memory table, leaving reasoning in a shared adapter. This design achieves 5.6x higher indirect-reasoning accuracy and maintains base-level reasoning performance, with a memory footprint 33,000x smaller than per-user LoRA. The approach enables disjoint user edits that compose losslessly, outperforming retrieval pipelines beyond 100 facts.

arxiv arXiv cs.AI · 7d ago

Data Intelligence Agents Enable Autonomous Data Querying

Data Intelligence Agents (DIA) deploy autonomous coding agents to streamline enterprise data workflows. The Query Generator matches or exceeds top published results on seven SQL benchmarks across four dialects, showing generalization through natural-language instructions and execution-based architecture.

arxiv arXiv cs.LG · 8d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.AI · 9d ago

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

BinTrack is a fully open-source spatial question answering agent that uses binary search over a robot's trajectory to locate answers. It achieves up to 22.8% higher accuracy than other open-source methods and matches closed-source model performance on the most challenging global category of the SpaceLocQA benchmark. The system also offers over 1.5x faster inference and introduces GangnamLoop, a real-world outdoor benchmark collected with a quadruped robot.

arxiv arXiv cs.LG · 7d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi is a 7B-parameter uniform diffusion language model pretrained from scratch on 1.5T tokens. It competes with autoregressive models on knowledge, reasoning, and coding tasks but underperforms on commonsense benchmarks, likely due to its education-heavy data mixture. The model weights, checkpoints, and full training recipe are publicly released.

arxiv arXiv cs.CL · 7d ago

Morpheus: Neural Tokenizer and Embedder for Turkish

Morpheus is a morphology-aware neural tokenizer and word embedder for Turkish that preserves original text through lossless encoding and decoding. It achieves the lowest bits-per-character (1.425), improves morphological alignment (MorphScore macro-F1 0.61), and uses 19% less GPU memory than 64K-vocabulary subword tokenizers. Frozen Morpheus embeddings outperform BGE-M3 and BERTurk in lexical retrieval, with root-family MAP of 0.85 and ROC-AUC of 1.00.

arxiv arXiv cs.CL · 7d ago

SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration

SAGE is a multi-agent framework for prompt optimization that combines diagnostic code execution with quantitative validation. It improves mental-health chatbot retention by aggregating eight cycles of noisy A/B tests into statistically significant gains, demonstrating effectiveness in open-ended dialogue tasks through qualitative and quantitative feedback integration.

arxiv arXiv cs.AI · 7d ago

CAPRA: Multi-Agent LLM System for Software Architecture Feedback

CAPRA is a multi-agent LLM system that generates personalized, template-compliant LaTeX feedback on software architecture deliverables. It uses specialized agents, PyMuPDF, and gpt-4o to extract and analyze text and UML diagrams, with evidence anchoring and consistency management to ensure reliability. A preliminary evaluation of 10 student reports shows CAPRA met 88.8% of eight criteria and achieved moderate inter-rater agreement (kappa = 0.582), with each report processed in under 4 minutes.

arxiv arXiv cs.AI · 7d ago

ProductConsistency: Enhancing Product Identity in Image Editing

The ProductConsistency dataset introduces 87k SFT samples and 869 RL samples to improve product identity preservation in image editing. It includes a benchmark for standardized evaluation and uses a cyclic consistency reward to enforce semantic product identity through caption similarity. Fine-tuning Qwen-Image-Edit-2511 and Flux.1-Kontext-dev shows a 5x reduction in character error rate and improved text rendering and visual quality.

arxiv arXiv cs.AI · 7d ago

Technical Taxonomy of LLM Agent Communication Protocols

A new taxonomy classifies LLM agent communication protocols across five dimensions: counterparty, payload, interaction state, discovery mechanism, and schema flexibility. Analysis shows hybrid payloads, session-state persistence, and runtime schema negotiation are common, with decentralized discovery remaining rare. The study predicts short-term convergence toward unified agent-to-agent and agent-to-context protocols, and long-term evolution toward a federated, layered protocol stack.

media r/LocalLLaMA · 7d ago

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model

The Inflect-Nano-v1 model is the second smallest publicly released TTS model after TinyTTS, with 4.63M total parameters. It performs surprisingly well for its size, running locally on low-end devices and offering a baseline for tiny speech synthesis in embedded or offline applications.

media r/LocalLLaMA · 8d ago

LoopCoder-V2: Two-Loop PLT Model Achieves Best Gain-Cost Trade-Off

LoopCoder-V2 is a 7B instruction-tuned code model based on Parallel Loop Transformer (PLT), trained on 18T tokens of mixed text and code data. The two-loop variant achieves the best gain-cost balance, improving SWE-bench Verified from 43.0 to 64.4, while three or more loops result in regression due to increasing positional mismatch and unstable updates.

arxiv arXiv cs.LG · 8d ago

LoopCoder-v2 Achieves Optimal Two-Loop Performance

LoopCoder-v2, a parallel loop Transformer model, achieves superior code generation and reasoning performance with two loops, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. Variants with three or more loops perform worse, indicating a non-monotonic loop-count effect due to growing positional mismatch and diminishing returns.

arxiv arXiv cs.AI · 8d ago

Agentic AI Framework Reduces Diagnostic Errors in Healthcare

A multi-agent AI framework addresses premature diagnostic handoff and silent hallucinations in healthcare by enforcing structured clinical protocol completion and epistemic uncertainty quantification. Evaluations on 150 simulated cases show 49.3% diagnostic precision, an 11.3 percentage point improvement over baseline, with a statistically significant negative correlation between OLDCARTS completeness and diagnostic uncertainty.

arxiv arXiv cs.AI · 8d ago

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

EAGG introduces a grasp generator that aligns embodiment structure within a shared model using topology-aware graphs and geometry-aware tokens. It achieves 56.17% average grasp success on MultiGripperGrasp, matching specialized models within 1.10 percentage points and reducing median contact distance from 0.239 cm to 0.189 cm.

arxiv arXiv cs.CL · 8d ago

SwiftTrans Improves LLM Code Translation Efficiency

SwiftTrans addresses runtime efficiency gaps in LLM-based code translation by introducing Multi-Perspective Exploration and Difference-Aware Selection. The framework extends CodeNet, F2SBench, and introduces SwiftBench to evaluate runtime performance, showing consistent improvements in both correctness and efficiency across benchmarks.

arxiv arXiv cs.AI · 9d ago

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

CircuitLasso proposes a scalable method for learning sparse circuits in large language models using sparse linear regression. It achieves structural accuracy comparable to state-of-the-art intervention-based methods at significantly lower computational cost, while enabling efficient discovery of semantic feature propagation and improving performance on domain-generalization tasks with reduced cost.

arxiv arXiv cs.AI · 9d ago

Agentic LLM Framework for HTS Code Classification

A consensus-based agentic large language model framework is proposed for accurate 10-digit Harmonized Tariff Schedule code classification in Canadian maritime logistics. Evaluated on 3,300 expert-labeled product records, the framework shows that fine-grained HTS classification remains challenging for advanced LLMs, highlighting the need for evidence-grounded, uncertainty-aware, and human-in-the-loop workflows.

arxiv arXiv cs.LG · 9d ago

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

CircuitLasso enables scalable circuit learning in large language models by using sparse linear regression. It recovers circuits with structural accuracy matching state-of-the-art methods at significantly lower computational cost, and demonstrates human-interpretable semantic propagation through model components. The learned circuits achieve comparable performance on a domain-generalization task with reduced cost.

media r/LocalLLaMA · 8d ago

PSA: unsloth/GLM-5.2-GGUF is uploading

A Reddit user noticed that the unsloth/GLM-5.2-GGUF repository was created just half an hour ago and currently contains only a README. They suspect that GGUF model files are being uploaded and have shared a link to the repository.

User as Engram: Local Parametric Edits for Personal Memory

Data Intelligence Agents Enable Autonomous Data Querying

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

BinTrack: Open-Source Spatial QA with Binary Trajectory Search

Sumi: Open Uniform Diffusion Language Model from Scratch

Morpheus: Neural Tokenizer and Embedder for Turkish

SAGE: Stochastic Prompt Optimization via Agent-Guided Exploration

CAPRA: Multi-Agent LLM System for Software Architecture Feedback

ProductConsistency: Enhancing Product Identity in Image Editing

Technical Taxonomy of LLM Agent Communication Protocols

I released Inflect-Nano, an ultra-extreme tiny 4.63m parameter TTS model

LoopCoder-V2: Two-Loop PLT Model Achieves Best Gain-Cost Trade-Off

LoopCoder-v2 Achieves Optimal Two-Loop Performance

Agentic AI Framework Reduces Diagnostic Errors in Healthcare

EAGG: Embodiment-Aligned Grasp Generation via Geometry-Aware Graph Conditioning

SwiftTrans Improves LLM Code Translation Efficiency

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

Agentic LLM Framework for HTS Code Classification

CircuitLasso: Scalable Circuit Learning for LLM Interpretability

PSA: unsloth/GLM-5.2-GGUF is uploading