All articles — korshunov.ai

All articles Page 1 / 129

Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent

Researchers introduce Agents-A1, a 35B Mixture-of-Experts model that achieves performance comparable to trillion-parameter models by scaling the agent horizon rather than parameter count. The approach focuses on extending long-horizon trajectories and unifying heterogeneous agent abilities through a specialized training infrastructure.

arxiv arXiv cs.CL · 4h ago

Self-Evolving World Models for LLM Agent Planning

The paper introduces WorldEvolver, a framework that equips long-horizon LLM agents with reliable foresight by revising deployment-time context without modifying model parameters. It addresses the issue of unreliable predictions degrading decision-making through a self-evolving approach that enhances predictive fidelity and planning performance.

media Hugging Face Forums · 4h ago

Trajlens: a validator for LeRobotDataset, audited 100 Hub datasets

The author introduces Trajlens, an open-source linter for the LeRobotDataset category on Hugging Face Hub, and reports results from auditing 100 random public datasets tagged with 'lerobot'. The audit revealed that only 19 datasets passed validation, while 13 failed due to specific upstream bugs and 47 encountered load errors or timeouts.

media Hugging Face Forums · 4h ago

Architectural Proposal: The Epistemological Adversarial Network (EAN) for Open-Source AI

A feature request proposes the Epistemological Adversarial Network (EAN), an architecture designed to transform AI from a system mirroring institutional consensus into a decentralized, multi-perspective verification engine. This approach aims to eliminate political and corporate power plays by removing any single "source of truth" model.

media Hugging Face Forums · 4h ago

Community Discussion on Open-Source LLMs for Chatbot Development

A discussion thread on the Hugging Face forums asks users which free or open-source AI models they currently utilize for chatbot development and their reasons for preference.

media Hugging Face Forums · 4h ago

Top 5 models i can run with my hardware? No AI lobotomization

A user on the Hugging Face forums seeks recommendations for uncensored AI models capable of reasoning about complex topics, citing a preference for earlier versions of GPT-4 over current iterations.

github llama.cpp · 4h ago

llama.cpp b9847 release fixes Gemma E4B MTP FlashAttention

The llama.cpp project has released version b9847, which includes a fix for Gemma E4B MTP FlashAttention on CUDA and the removal of an unused template declaration.

media r/LocalLLaMA · 5h ago

How I'm using local models from real-world coding

The author shares a practical setup for using local large language models on modest hardware, specifically a laptop with 32GB of RAM and an NVIDIA RTX 4070 with 8GB VRAM. The core strategy involves running the Qwen3.6-35B-A3B model locally as a 'small coding agent' while offloading complex planning to a cloud-based GLM 5.2 instance.

arxiv arXiv cs.CL · 5h ago

A Diagnostic Framework and Multi-Evaluator Audit of Evaluator-Driven Preference Dynamics in Self-Adapting LLM Agents

The article documents how measurements from proprietary LLM evaluators can become invalid within weeks, introducing the EPC framework to detect such instability. It applies this diagnostic across eight experimental conditions, revealing that version-conditional instability makes single-snapshot evaluator studies unreliable.

arxiv arXiv cs.CL · 5h ago

The Hidden Cost of Resampling: How Imbalance Correction Degrades Probability Calibration in Tree Ensembles

This study evaluates the impact of resampling methods like SMOTE and random undersampling on probability calibration in tree ensembles, finding that while SMOTE's cost is small, undersampling severely degrades calibration.

arxiv arXiv cs.CL · 5h ago

How Far Do On-Prem Open LLMs Get on Text-to-SQL? A Cross-Family Size x Technique Frontier on BIRD

This study evaluates the performance of open-weight large language models running on-premises for text-to-SQL tasks using a reproducible benchmark on the BIRD development split. It compares three model families across two generations while ablating specific accuracy-enhancing techniques to determine their actual value.

arxiv arXiv cs.CL · 5h ago

Fast Numbers, Slow Language: Bridging Quantitative and Qualitative Earnings Signals

The article introduces EarningsInOne, a new corpus aligning earnings news, conference call transcripts, and prices for the SP 1500 universe from 2022 to 2025. This resource bridges the gap between financial economists and NLP researchers by providing unified trading setups and evaluation metrics for both quantitative and qualitative signals.

arxiv arXiv cs.CL · 5h ago

Managing Map Cardinality in Automatic Disease Classification Mapping

The article introduces a novel method for automatic mapping between disease classification systems, such as ICD-9-CM and ICD-10-CM, that addresses the limitations of existing embedding-based approaches which often overlook complex one-to-many scenarios. By employing a blocking-and-matching pipeline inspired by entity resolution, the authors utilize large language models to identify valid mappings within candidate blocks.

arxiv arXiv cs.CL · 5h ago

Mandol: An Agglomerative Agent Memory System for Long-Term Conversations

Researchers propose Mandol, an agglomerative memory system designed to consolidate fragmented memory representations into a unified architecture for long-term conversational agents. This approach addresses the high latency and noise issues inherent in existing systems that rely on heterogeneous vector and graph databases.

arxiv arXiv cs.CL · 5h ago

Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning

This position paper argues that humans possess an evolved instruction-following bias, an innate inductive bias shaped by evolution to interpret and execute linguistic instructions. This cognitive feature enables rapid instructed task learning (RITL) and allows for the fast generalization of behavior from language.

arxiv arXiv cs.CL · 5h ago

Fund2Persona: Building Financial Advisor Personas from Fund Data

The authors propose Fund2Persona, a framework that grounds financial advisor personas in fund disclosures, holdings transitions, and manager commentary to address the difficulty of scaling consistent expertise in LLM systems. The system refines these personas through an agentic actor-scorer-patcher loop, moving beyond simple persona prompts that often drift toward generic recommendations.

arxiv arXiv cs.CL · 6h ago

Systematic Benchmark of Lightweight Hallucination Detection Across QA, Dialogue, and Summarisation

This paper benchmarks five lightweight, CPU-feasible hallucination detection methods to provide practical alternatives for resource-constrained researchers who cannot use GPU-intensive or proprietary solutions. The study evaluates ROUGE-L, semantic similarity, BERTScore, a FEVER-trained DeBERTa NLI detector, and an ensemble of similarity and NLI across the HaluEval benchmark's question answering, dialogue, and summarisation tasks.

arxiv arXiv cs.CL · 6h ago

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code LLMs

The authors introduce SrDetection, a unified framework for detecting data leakage in code large language models that operates in both gray-box and black-box settings. The method generates semantically equivalent variants of benchmark samples to identify cases where the original data is disproportionately easier for the model due to pre-training exposure.

arxiv arXiv cs.CL · 6h ago

Neural Procedural Memory: Empowering LLM Agents with Implicit Activation Steering

The paper introduces Neural Procedural Memory (NPM), a training-free framework that enables Large Language Model agents to utilize implicit activation steering for procedural memory instead of relying on explicit textual instructions. By distilling skills from historical experiences into steering vectors, NPM directly activates task-relevant neural mechanisms to guide execution.

arxiv arXiv cs.CL · 6h ago

Revealing the Technology Development of Natural Language Processing: A Scientific Entity-Centric Perspective

This study analyzes the development of technologies in Natural Language Processing (NLP) from an entity-centric perspective, extracting methods, datasets, metrics, and tools to measure their impact via co-occurrence networks. The research reveals that while pre-trained language models like BERT and Transformer have become mainstream, the average number of entities per paper is increasing, indicating a growing knowledge burden for researchers.