All articles — korshunov.ai

All articles Page 1 / 116

The Context-Ready Transformer

The authors introduce the context-ready transformer, a recurrent neural network architecture that pre-contextualizes each token before it enters a D-layer transformer block using a correction network.

arxiv arXiv cs.CL · 8h ago

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

The authors propose Entropy-guided Multi-Token Prediction (EntMTP), a training-free scheduler that dynamically adjusts speculation depth during LLM inference based on local generation entropy. This approach addresses the inefficiency of static tree-based attention topologies by matching compute requirements to context predictability.

arxiv arXiv cs.CL · 8h ago

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

The article introduces Ko-WideSearch, a new benchmark designed to evaluate the breadth-search capabilities of web agents in Korean, addressing the lack of exhaustive set enumeration metrics outside English.

arxiv arXiv cs.CL · 8h ago

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

The authors introduce Narrative-UFET, a controlled extension of ultra-fine entity typing that pairs entity mentions with automatically generated short narratives to address limitations in long-tail type disambiguation. The study demonstrates that narrative context yields consistent improvements over sentence-level baselines, particularly when the entity's type shifts within the text.

arxiv arXiv cs.CL · 8h ago

Masked Language Flow Models

The authors introduce Masked Language Flow Models (MLFMs), which combine masked diffusion with continuous flows to enable efficient, multi-step reasoning in language generation. This approach bridges the gap between parallel generation efficiency and complex task performance by allowing pretrained models to be adapted into MLFMs.

arxiv arXiv cs.CL · 8h ago

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

This paper introduces DysLexLens, a low-resource LLM framework designed to analyze the experiences of dyslexic learners with AI tools through online forum discussions. The system provides an end-to-end, evidence-traceable architecture that transforms noisy social media posts into focused corpora and generates verifiable query responses.

arxiv arXiv cs.CL · 8h ago

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

This paper addresses the performance degradation of offensive comment detection models when deployed across different Chinese social media platforms by proposing a dual-threshold hard example mining method.

media Hugging Face Forums · 8h ago

The Generational Context Architecture: Solving LLM Context Rot

The Generational Context Architecture (GCA) proposes treating an LLM's context window as a finite lifespan rather than infinite storage to solve "context rot" and attention dilution in multi-agent systems. By enforcing artificial mortality, agents are terminated before performance degrades, passing their state to new generations via a flat-file Markdown vault.

arxiv arXiv cs.CL · 9h ago

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

The Yuvion LLM is a new large language model designed to address safety failures by treating adversarial robustness and agentic capability as primary objectives. It utilizes a pipeline combining adversarially aware data construction, knowledge-enhanced continued pretraining, and policy-grounded multi-task safety post-training.

arxiv arXiv cs.CL · 9h ago

DiscoBench: A Benchmark for Clarification-Aware Deep Search

The authors introduce DiscoBench, a benchmark designed to evaluate whether search agents powered by large language models can proactively identify ambiguity and ask effective clarification questions during deep search tasks. Unlike existing benchmarks that assume complete user queries, this framework addresses the reality of vague or underspecified requests in real-world scenarios.

arxiv arXiv cs.CL · 9h ago

Factorised Study of Probe-Based Uncertainty Estimation in LLMs

This study conducts a factorised analysis of probe-based uncertainty estimation to determine what drives performance in detecting hallucinations within Large Language Models. The research isolates variables across feature design, training data, and evaluation settings to provide clear insights into effective methodologies.

arxiv arXiv cs.CL · 9h ago

Textual Belief States for World Models: Identifiable Representation Learning Under Strict Mediation

This article addresses the issue of unidentifiable latent states in LLM-based world models caused by history bypass, proposing strict latent state mediation to resolve this. The authors introduce textual latent states and factorized GRPO (fGRPO), a tree-structured reinforcement learning method that enforces strict mediation during training.

media Hugging Face Forums · 9h ago

Analysis of hidden-state dynamics across 7 open-weight LLMs reveals recurring functional patterns

An independent researcher analyzed the evolution of hidden representations during inference across seven open-weight models, including GPT-2, OPT-125M, and Llama-3.2-1B, to identify internal dynamical regimes beyond standard output benchmarks.

media Hugging Face Forums · 9h ago

Exploring Functional Regimes Inside Small Language Models

This independent research project characterizes the internal dynamics of seven small and medium-sized language models by analyzing how hidden representations evolve during inference rather than relying on standard output benchmarks. The study investigates dynamic behavior, functional organization, and representation geometry to identify reproducible patterns across different architectures.

media Hugging Face Forums · 9h ago

World Cup 2026 predictor

A developer has created a World Cup 2026 prediction tool that uses historical data to simulate tournament outcomes. The application provides win probabilities and score predictions for any two national teams based on patterns learned from approximately 50,000 international matches spanning over a century.

media Hugging Face Forums · 9h ago

A comprehensive, bilingual guide to Transformers: From foundations to KV-cache compression & attention dynamics

Carles Marin has released an open-source, bilingual (English and Spanish) guide that bridges the mathematical foundations of Transformer architectures with their practical implementation. The resource focuses on low-level mechanics, providing reproducible code and interactive elements to explain complex topics.

media Hugging Face Forums · 9h ago

Open-source bilingual guide on Transformer mechanics published

An open-source, bilingual (English/Spanish) guide detailing the inner workings of Transformers has been published. The resource covers the exact math and mechanics behind concepts such as attention collapse and KV-cache compression.

arxiv arXiv cs.CL · 9h ago

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

Researchers propose a protocol to mitigate p-hacking in large language model (LLM) research by preregistering experiments and running confirmatory analyses on the first eligible LLM released after the commitment. This approach prevents researchers from tuning prompts or parameters to achieve desired results, as the target model does not exist at the time of preregistration.

arxiv arXiv cs.CL · 9h ago

Joint Transcription and Decryption of Images of Encrypted Handwritten Documents: A Comparison with the Traditional Pipeline

Researchers propose Direct Image Decryption, an end-to-end approach that maps encrypted manuscript images directly to plaintext, bypassing the intermediate transcription stage used in traditional pipelines. Using the Copiale cipher as a case study, the authors compare this joint architecture against the conventional two-stage method of transcription followed by decryption.

arxiv arXiv cs.CL · 9h ago

Mitigating Position Bias in Transformers via Layer-Specific Positional Embedding Scaling

Researchers introduce layer-specific positional embedding scaling (LPES) to address the "lost-in-the-middle" problem in large language models, where critical information in long-context inputs is often underrepresented. This method assigns distinct scaling factors to each transformer layer to achieve a more balanced attention distribution without requiring parameter fine-tuning or increasing inference delay.