All articles — korshunov.ai

All articles Page 1 / 118

Slow performance Unsloth Gemma 12B Q8

A user reports a significant drop in inference speed when switching from GPT-OSS 20B Q4 to Gemma 4 12B Q8 using llama.cpp, with throughput falling from approximately 70 tokens per second to 10 tokens per second. The issue persists even when testing a Q5 model variant and disabling the thinking feature, which only yielded a marginal gain of two additional tokens per second.

github llama.cpp · 8h ago

llama.cpp b9839 release with Tailwind scanning fix and multi-platform binaries

The llama.cpp project has released version b9839, which includes a fix to restore Tailwind scanning in ignored worktrees. This update provides pre-built binaries for macOS, Linux, Android, Windows, and openEuler across various architectures and hardware acceleration backends.

lab OpenAI News · 8h ago

Mapping Europe’s AI Workforce Opportunity

OpenAI Economic Research has extended its AI Jobs Transition Framework to the European Union, utilizing ESCO taxonomy and Eurostat data to analyze how AI capabilities may reshape labor markets across member states.

arxiv arXiv cs.LG · 9h ago

Selective Time Series Forecasting via Metalearning

This article introduces a selective forecasting framework that allows models to abstain from high-risk predictions by modeling the empirical percentile of forecasting errors through metalearning. By using scale-invariant statistics derived from recent lags, the method decouples rejection decisions from forecasts to enable transfer across heterogeneous time series.

arxiv arXiv cs.LG · 9h ago

Do Location Encoders Capture Spatial Effects? A GeoShapley Benchmark Across Scales

This study benchmarks whether GeoShapley, a game-theoretic explainer, can recover spatially varying coefficients from machine learning models using location encoder embeddings. Eleven encoders from the TorchSpatial framework were evaluated against a synthetic process with known coefficients across grid, county, and global scales.

arxiv arXiv cs.LG · 9h ago

Time Series Classification through Diffeomorphic Time Warping (DiffTW)

The article introduces Diffeomorphic Time Warping (DiffTW), a theoretical framework for time series classification that learns mappings between real-valued functions to overcome the discrete point matching limitations of Dynamic Time Warping (DTW). DiffTW approximates diffeomorphic transformations using the method of characteristics to solve linear transport equations, providing a theoretically grounded dissimilarity measure.

arxiv arXiv cs.LG · 9h ago

Sublinearly Structured Deep Neural Networks Achieve Feature Learning Consistency for Compositional Functions

This study establishes feature-learning consistency guarantees for a broad subclass of deep neural networks characterized by sublinear growth in input/output dimensions and hidden neurons relative to sample size. The authors prove that these architectures achieve universal approximation for hierarchically compositional functions, even within the conventional over-parameterized regime where parameters exceed training samples.

arxiv arXiv cs.LG · 9h ago

TROPT: An Open Framework for Unifying and Advancing Discrete Text Optimization

TROPT is introduced as the first open-source framework that unifies discrete text-trigger optimization by standardizing execution and development under a single interface. It addresses current fragmentation by allowing users to customize end-to-end optimization recipes through interchangeable models, objectives, and optimizers.

arxiv arXiv cs.LG · 9h ago

FLKit: A Structured Onboarding Toolkit for Federated Learning in Health

FLKit is an open, community-maintained onboarding toolkit designed to help multidisciplinary teams navigate the federated learning lifecycle in health and life sciences research. It provides role-aware entry points for clinical, legal, governance, and technical contributors, addressing the practical barriers of scattered frameworks and governance obligations.

arxiv arXiv cs.LG · 9h ago

FairBED: A Bayesian Experimental Design Approach to Gathering Fairer Data

The article introduces FairBED, a framework that modifies the data acquisition process itself to gather inherently fairer data, addressing biases present in existing datasets. It provides novel formulations for quantifying dataset fairness based on the principle that fair datasets should be uninformative about sensitive attributes.

media r/LocalLLaMA · 9h ago

DeepSeek V4 by am17an · Pull Request #24162 · ggml-org/llama.cpp

A pull request submitted to the ggml-org/llama.cpp repository enables local execution of the DeepSeek V4 model.

arxiv arXiv cs.CL · 9h ago

DMV-Bench: Diagnosing Long-Horizon Multimodal Agents' Visual Memory with Incidental Cue Injection

Researchers introduce DMV-Bench, the first interactive benchmark designed to evaluate visual memory in multimodal agents within controlled environments. The study proposes DualMem, a parallel visual and verbal memory architecture that significantly outperforms existing systems on this new diagnostic tool.

arxiv arXiv cs.LG · 10h ago

Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference

This paper introduces Concordia, a runtime designed to provide fault tolerance for long-running LLM agents by maintaining valuable state on GPUs without restarting the serving stack. The system utilizes a device-resident persistent kernel that interposes on GPU module loading to support PTX- and SASS-level instrumentation.

media r/LocalLLaMA · 10h ago

GLM 5.2 Q1_S vs Qwen 27B Q8: A Local LLM Comparison

An amateur comparison on consumer hardware demonstrates that the heavily quantized GLM-5.2 (Q1_S) outperforms the higher-bit Qwen 3.6 27B (Q8) in a complex coding task, despite significantly slower inference speeds.

media r/LocalLLaMA · 10h ago

Reddit user seeks flashy, feature-rich AI chat interface over minimalist options

A Reddit user is asking for recommendations on "flashy" and feature-heavy chat interfaces, specifically comparing LibreChat and OpenWebUI, for a technically inclined but AI-illiterate friend.

media r/LocalLLaMA · 10h ago

MiCA is now part of Hugging Face PEFT

The MiCA (Minor Component Adaptation) method has been merged into the main branch of the Hugging Face PEFT library, allowing users to install it directly from source. It is exposed through the existing LoRA interface by setting `init_lora_weights="mica"`.

media r/LocalLLaMA · 10h ago

AMD MI210 64GB vs DCU K100 64GB

A Reddit user compares the pricing and specifications of the AMD Instinct MI210 64GB and the Chinese DCU K100 64GB GPUs available on the Chinese eBay market. The discussion highlights that while both cards offer similar memory capacities, they differ significantly in price, bandwidth, and architectural details.

media r/LocalLLaMA · 10h ago

Update: First Manual Results from Testing Procedural Skill Transfer in Small Models

A manual experiment tested whether a procedural scaffold generated by a large model can transfer planning discipline to smaller models without fine-tuning or revealing the target answer. The results indicate that this approach significantly improves structural readability and composition in small models when applied across different Three.js domains.

arxiv arXiv cs.CL · 10h ago

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

This study investigates the statistical learning and mental representation of neural language models by training Generative Transformer models on a synthetic grammar and analyzing their internal representations at various stages.

arxiv arXiv cs.CL · 10h ago

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

This article identifies a distinct failure mode in large language model agents where they struggle to discard outdated facts in favor of current ones, even when comprehension is intact. The authors demonstrate that this "supersession gap" persists across model scales and memory sizes, indicating it is a trainable bottleneck rather than a limitation of context window or model strength.