All articles — korshunov.ai

All articles Page 1 / 115

Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference

This paper introduces Concordia, a runtime designed to provide fault tolerance for long-running LLM agents by maintaining valuable state on GPUs without restarting the serving stack. The system utilizes a device-resident persistent kernel that interposes on GPU module loading to support PTX- and SASS-level instrumentation.

media r/LocalLLaMA · 4h ago

GLM 5.2 Q1_S vs Qwen 27B Q8: A Local LLM Comparison

An amateur comparison on consumer hardware demonstrates that the heavily quantized GLM-5.2 (Q1_S) outperforms the higher-bit Qwen 3.6 27B (Q8) in a complex coding task, despite significantly slower inference speeds.

media r/LocalLLaMA · 4h ago

Reddit user seeks flashy, feature-rich AI chat interface over minimalist options

A Reddit user is asking for recommendations on "flashy" and feature-heavy chat interfaces, specifically comparing LibreChat and OpenWebUI, for a technically inclined but AI-illiterate friend.

media r/LocalLLaMA · 4h ago

MiCA is now part of Hugging Face PEFT

The MiCA (Minor Component Adaptation) method has been merged into the main branch of the Hugging Face PEFT library, allowing users to install it directly from source. It is exposed through the existing LoRA interface by setting `init_lora_weights="mica"`.

media r/LocalLLaMA · 4h ago

AMD MI210 64GB vs DCU K100 64GB

A Reddit user compares the pricing and specifications of the AMD Instinct MI210 64GB and the Chinese DCU K100 64GB GPUs available on the Chinese eBay market. The discussion highlights that while both cards offer similar memory capacities, they differ significantly in price, bandwidth, and architectural details.

media r/LocalLLaMA · 4h ago

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

A manual experiment tested whether a procedural scaffold generated by a large model can transfer planning discipline to smaller models without fine-tuning or revealing the target answer. The results indicate that this approach significantly improves structural readability and composition in small models when applied across different Three.js domains.

arxiv arXiv cs.CL · 4h ago

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

This study investigates the statistical learning and mental representation of neural language models by training Generative Transformer models on a synthetic grammar and analyzing their internal representations at various stages.

arxiv arXiv cs.CL · 4h ago

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

This article identifies a distinct failure mode in large language model agents where they struggle to discard outdated facts in favor of current ones, even when comprehension is intact. The authors demonstrate that this "supersession gap" persists across model scales and memory sizes, indicating it is a trainable bottleneck rather than a limitation of context window or model strength.

github llama.cpp · 4h ago

llama.cpp b9838 Release: Builds for macOS, Linux, Windows, Android

The llama.cpp project has released version b9838, providing pre-built binaries for a wide range of operating systems and hardware accelerators. This release includes support for CPU, GPU (CUDA, Vulkan, ROCm, OpenCL), and specialized AI accelerators across macOS, Linux, Windows, Android, and openEuler.

arxiv arXiv cs.CL · 5h ago

Aloe-Vision: Robust Vision-Language Models for Healthcare

This work introduces Aloe-Vision, a family of open-source large vision-language models (7B and 72B) trained on the newly released Aloe-Vision-Data dataset to address data scarcity and robustness issues in healthcare AI. The authors demonstrate that their high-quality training mixture yields significant performance gains over baselines while maintaining general capabilities.

arxiv arXiv cs.CL · 5h ago

The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching

A re-derivation of the activation patching estimand from causal mediation analysis reveals that the natural indirect effect (NIE) captures not only the causal effect through a specific component but also interaction effects (INT). These INT terms measure how much a component's causal effect depends on the state of other components in the model, challenging the assumption that NIE isolates individual contributions.

arxiv arXiv cs.CL · 5h ago

The Context-Ready Transformer

The authors introduce the context-ready transformer, a recurrent neural network architecture that pre-contextualizes each token before it enters a D-layer transformer block using a correction network.

arxiv arXiv cs.CL · 5h ago

EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction

The authors propose Entropy-guided Multi-Token Prediction (EntMTP), a training-free scheduler that dynamically adjusts speculation depth during LLM inference based on local generation entropy. This approach addresses the inefficiency of static tree-based attention topologies by matching compute requirements to context predictability.

arxiv arXiv cs.CL · 5h ago

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

The article introduces Ko-WideSearch, a new benchmark designed to evaluate the breadth-search capabilities of web agents in Korean, addressing the lack of exhaustive set enumeration metrics outside English.

arxiv arXiv cs.CL · 5h ago

Narrative-UFET: Narrative Generation for Ultra-Fine Entity Typing

The authors introduce Narrative-UFET, a controlled extension of ultra-fine entity typing that pairs entity mentions with automatically generated short narratives to address limitations in long-tail type disambiguation. The study demonstrates that narrative context yields consistent improvements over sentence-level baselines, particularly when the entity's type shifts within the text.

arxiv arXiv cs.CL · 5h ago

Masked Language Flow Models

The authors introduce Masked Language Flow Models (MLFMs), which combine masked diffusion with continuous flows to enable efficient, multi-step reasoning in language generation. This approach bridges the gap between parallel generation efficiency and complex task performance by allowing pretrained models to be adapted into MLFMs.

arxiv arXiv cs.CL · 5h ago

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

This paper introduces DysLexLens, a low-resource LLM framework designed to analyze the experiences of dyslexic learners with AI tools through online forum discussions. The system provides an end-to-end, evidence-traceable architecture that transforms noisy social media posts into focused corpora and generates verifiable query responses.

arxiv arXiv cs.CL · 5h ago

Cross-Platform Chinese Offensive Comment Detection via Dual-Threshold Hard Example Mining

This paper addresses the performance degradation of offensive comment detection models when deployed across different Chinese social media platforms by proposing a dual-threshold hard example mining method.

media Hugging Face Forums · 5h ago

The Generational Context Architecture: Solving LLM Context Rot

The Generational Context Architecture (GCA) proposes treating an LLM's context window as a finite lifespan rather than infinite storage to solve "context rot" and attention dilution in multi-agent systems. By enforcing artificial mortality, agents are terminated before performance degrades, passing their state to new generations via a flat-file Markdown vault.

arxiv arXiv cs.CL · 6h ago

Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

The Yuvion LLM is a new large language model designed to address safety failures by treating adversarial robustness and agentic capability as primary objectives. It utilizes a pipeline combining adversarially aware data construction, knowledge-enhanced continued pretraining, and policy-grounded multi-task safety post-training.