All articles — korshunov.ai

All articles Page 1 / 110

Tapered Language Models: Improving Performance via Depth-Aware Capacity Allocation

The article introduces Tapered Language Models (TLMs), an architectural principle that allocates more parameter capacity to earlier layers and less to later layers within a fixed budget. This approach challenges the standard practice of uniform layer width by leveraging evidence that later layers primarily refine the residual stream rather than transforming it.

arxiv arXiv cs.LG · 4h ago

PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment

This study introduces PsyBridge, a hybrid intelligent framework designed to address the limitations of isolated mental health screening tools by integrating clinically validated assessments with cognitive and personality profiling. The system utilizes a modular architecture and weighted aggregation mechanism to generate interpretable risk classifications and decision support recommendations.

arxiv arXiv cs.LG · 5h ago

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

This article addresses the lack of rigorous convergence theory for the AdamW optimizer in regimes with heavy-tailed stochastic gradient noise, which is common in large language model pretraining. It questions whether AdamW can converge under these conditions or if its second-moment accumulator creates a genuine obstruction.

arxiv arXiv cs.LG · 5h ago

Semantic Browsing: Controllable Diversity for Image Generation

This article introduces Semantic Browsing, a method for generating controlled diversity in text-to-image models by enforcing structure on generated samples to overcome the lack of meaningful variation in current systems. The approach induces diversity directly at the text level rather than relying on stochastic variations within the model.

media r/LocalLLaMA · 5h ago

User implements C++ tool execution with MiMo-V2.5-GGUF

A user successfully utilized the MiMo-V2.5-GGUF model to write a built-in llama.cpp tool for executing C++ code and retrieving results. The implementation was achieved using opencode, where the model generated the necessary code based on specific instructions.

media r/LocalLLaMA · 6h ago

Why so many trash fine-tuned models on HuggingFace?

The author observes that the majority of fine-tuned models uploaded to Hugging Face perform worse than their base counterparts, rendering them useless. This proliferation is attributed to individuals using these models as a form of professional credentialing to secure high-paying AI positions.

github llama.cpp · 6h ago

llama.cpp b9835 release with UI stop and reasoning skip fixes

The llama.cpp project has released version b9835, which includes a fix for the stop and reasoning skip functionality in single-model mode. This update addresses specific issues within the user interface to improve control during model inference.

media r/LocalLLaMA · 7h ago

Script to monitor llama cpp and analyze memory usage

A user has shared a Bash script designed to parse the verbose output of llama.cpp, providing a clear summary of VRAM/RAM requirements and runtime performance metrics. This tool addresses the difficulty of predicting memory needs for various model quantizations by grouping buffer allocations by function and backend.

media r/LocalLLaMA · 7h ago

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

This article reports on an update to the Ornith-1.0-35B model, featuring a native MTP draft head grafted onto the IQ4_XS body for self-speculative decoding in llama.cpp. The author provides comprehensive performance metrics including throughput, time-to-first-token (TTFT), and long-context capabilities on a single RTX PRO 6000 Blackwell GPU.

media r/LocalLLaMA · 8h ago

Apple Refurbished Adds M5 Pro and Max Options

Following Apple's recent price increase, the company has added numerous top-of-the-line 14-inch MacBook Pro models equipped with M5 Pro and M5 Max chips to its refurbished store.

media r/LocalLLaMA · 8h ago

China Has Matched Anthropic in Cybersecurity, Resetting AI Race

A Wall Street Journal report indicates that Chinese artificial intelligence models have achieved parity with Anthropic's Claude in cybersecurity tasks.

media r/LocalLLaMA · 9h ago

Reddit user refutes Dario Amodei's claims against open-source AI

A Reddit post challenges Dario Amodei's assertion that open-source models are inferior to proprietary systems by arguing he misunderstands the technology. The author contends that Amodei is unaware of the transparency and capabilities of current open-weight models.

media Hugging Face Forums · 9h ago

Hypothetical Inquiry on AI Learning Binary Code

A forum user poses a speculative question regarding whether training neural networks or AI systems to understand binary code would significantly enhance their overall capabilities, particularly in coding tasks.

media Hugging Face Forums · 9h ago

Concept: Trading data for data to train AI models

A user proposes a concept for a website where individuals exchange data for data to train AI models, eliminating the need for monetary transactions. The system operates on a credit-based economy where users start with a set amount of credits and post bounties for specific data needs.

media Interconnects · 9h ago

Artifacts 22: Zyphra, Cohere, and Poolside are expanding the breadth of the ecosystem

The open AI model landscape is becoming increasingly diverse, shifting from dominance by a few Chinese players to a broader mix of organizations including sovereign AI initiatives, Big Tech, and product companies.

github llama.cpp · 10h ago

llama.cpp b9833 release: MiniCPM5 parser and multi-platform binaries

The llama.cpp project has released version b9833, introducing a dedicated parser for the MiniCPM5 model alongside various bug fixes and refactoring. This update includes support for tool call parsing, grammar simplification, and corrected Jinja API behavior to ensure compatibility with Jinja2 standards.

github llama.cpp · 11h ago

llama.cpp b9832 release adds --dump-prog debugging flag

The llama.cpp project has released version b9832, introducing a new `--dump-prog` command-line option for the Jinja template engine to aid in debugging. This update also includes pre-built binaries for macOS, Linux, Android, Windows, and openEuler across various CPU and GPU architectures.

media r/LocalLLaMA · 11h ago

Proposal for crowd-sourced, open-source distilled LLMs via distributed training

A Reddit user proposes a system to create truly open-source distilled large language models by wrapping existing command-line AI services. This approach would collect user inputs and outputs from applications like coding assistants or chatbots to build massive datasets through volunteer participation.

media r/LocalLLaMA · 12h ago

DeepSpec: A DeepSeek AI Collection for Speculative Decoding Draft Models

DeepSpec is a full-stack codebase released by deepseek-ai for training and evaluating draft models used in speculative decoding. The project provides data preparation utilities, implementation code, and evaluation scripts to facilitate the development of these auxiliary models.

github llama.cpp · 12h ago

llama.cpp b9831 release adds DFlash support and new binaries

The llama.cpp b9831 release introduces DFlash v2 support, including sliding window attention per layer types, alongside a comprehensive set of pre-built binaries for multiple platforms.