All articles — korshunov.ai

All articles Page 1 / 91

User Observes Cloud Chatbots Appear Less Intelligent Than Local Models

A Reddit user reports that cloud chatbots like ChatGPT and Claude often seem less capable than open-source models such as Kimi or GLM when discussing abstract concepts. The author notes that these commercial models frequently leap to conclusions, oversimplify ideas, and rely on repetitive phrasing patterns. This perceived decline in intelligence is attributed to system prompts designed to enforce a specific personality for user engagement. While this behavior was particularly prominent during the GPT-4o era, it reportedly persists in current versions. The user questions whether accessing these models via raw API removes the restrictive system prompts or if they remain embedded. The post seeks community feedback on whether cloud models perform better without these constraints.

media r/LocalLLaMA · 10h ago

Gefen: A Drop-in Replacement for AdamW with Claimed 8x Memory Reduction

Gefen is presented as a drop-in replacement for the AdamW optimizer, claiming an eightfold reduction in memory usage during training. The project includes a GitHub repository available at ndvbd/Gefen and a corresponding research paper hosted on arXiv under the identifier 2606.13894. This submission highlights Gefen's potential to optimize resource efficiency for machine learning workflows. The provided source material links directly to the technical documentation and codebase for further verification. No additional performance metrics or comparative benchmarks are detailed in the available text.

media Hugging Face Forums · 10h ago

User Reports HuggingFace Charging for Unused L40S Compute in Spaces

A user on the Hugging Face discussion forum reported an issue where their Space remained stuck at the starting phase while using an L40S GPU. The user expressed frustration that they were being charged for compute resources despite the application failing to launch or utilize any actual processing power. This incident highlights concerns regarding billing transparency and infrastructure reliability within the platform's Spaces environment. The post serves as a complaint about financial loss due to technical failures rather than a feature announcement. No further technical details or official responses were included in the truncated source content.

media Hugging Face Forums · 10h ago

User Reports Step 3.7 Flash Model Tool Access Failure on HuggingChat

A user on the Hugging Face discussion forum reported that the Step 3.7 Flash model by StepFun AI has lost its ability to use tools, including MCP servers, as of the morning of the report. The individual expressed concern over whether this outage is temporary or permanent, noting their strong preference for this specific model due to its high performance and low resource costs compared to competitors. Despite praising the model's quality and affordability, the user highlighted the immediate disruption caused by the inability to execute tool-based functions. The post seeks clarification from the community regarding prior experiences with similar issues and potential resolutions. This incident underscores a critical dependency on tool availability for users relying on this specific AI configuration.

media Hugging Face Forums · 11h ago

Ontological Inversion: Flipping LLM Emotional Concepts via Negative Gain

The author introduces 'ontological inversion,' a technique designed to expand the one-directional inference nature of Large Language Models. This method allows models to capture nuanced, multifaceted concepts, such as memories that evoke both sorrow and joy simultaneously. The approach was developed by applying a negative gain factor during sweeps into the Niodoo steering architecture. It addresses the common limitation where LLMs overfit to singular emotional labels when prompted with personal experiences. By inverting concepts similarly to physics involution, the technique enables models to flip emotional states, such as transforming sorrowful memories into joyful ones. The work is shared via a GitHub repository titled 'ontological-inversion' by user Ruffian-L.

media Hugging Face Forums · 11h ago

User Inquires About Organization Rename Process on Hugging Face

A user posted on the Hugging Face discussion forum seeking assistance with renaming their organization. The individual stated they sent an email to website@huggingface.co on June 15 requesting a change from DZER-Studios to Vexion-LM. Despite sending the initial request, the user reported receiving no response and observed that the organization name remained unchanged. Consequently, the poster asked whether organization renames are still supported by the platform. They also requested guidance on alternative methods to contact the team regarding this specific administrative request.

media Hugging Face Forums · 11h ago

Community Inquiry on Model Benchmarking Methods

A user on the Hugging Face discussion forum posted a question seeking advice on how to benchmark machine learning models. The inquiry was initiated by an individual who is new to the field of fine-tuning and wishes to evaluate their models after completion. The post explicitly asks for established methods or strategies that the community uses for this purpose. It highlights a common need among practitioners to understand standard evaluation practices in model development. The discussion thread currently contains only one post from a single participant. No specific benchmarks, metrics, or technical solutions were provided within the visible content of the source.

media Hugging Face Forums · 11h ago

Qwen3/Gemma3 Candle Skips Attention Masks for Equal-Length Batches in CPU Mode

A user has reported a critical bug in the Hugging Face text-embeddings-inference library affecting Qwen3 and Gemma3 models. The issue arises when running inference on CPUs with concurrent requests, leading to significant accuracy degradation. Specifically, the Candle backend incorrectly skips attention masks for batches where all input sequences have equal lengths. This defect compromises the reliability of embeddings generated under these specific conditions. To address the problem, the author submitted a pull request containing a fix that was thoroughly tested on their local machines. The bug highlights potential stability risks in CPU-based embedding services handling batched inputs.

github LlamaIndex · 11h ago

Llama Index v0.14.23 Release Notes

Llama Index released version 0.14.23 on June 24, 2026, introducing significant multimodal capabilities and various bug fixes. The core update includes multimodal synthesis features and the introduction of multimodal query engines to support diverse data types. Key fixes address document and video block handling within FunctionTool outputs and ensure URL-backed memory blocks are preserved correctly. Performance improvements were implemented by using sets for within-batch deduplication in the ingestion pipeline and optimizing token text splitting logic. The release also resolves a ZeroDivisionError on empty input sequences and fixes recursion errors in splitters when units exceed chunk sizes. Additionally, explicit UTF-8 encoding was added to file I/O operations, and deep copying of initial states prevents mutation leaks across workflow runs.

lab Claude Code Releases · 11h ago

Claude Code v2.1.191 Release Notes

Claude Code version 2.1.191 introduces /rewind support, allowing users to resume conversations from before a /clear command was executed. The update fixes several critical issues, including background agents resurrecting after being stopped and scroll position jumping during streaming responses. It also corrects behavior where /voice displayed generic error messages and where /login URLs were truncated in Windows Terminal. Significant improvements enhance reliability for MCP servers by adding retry logic for transient network errors during capability discovery and OAuth flows. Headless environments now skip browser popups for OAuth, while sandbox network permissions are remembered for the session duration. Performance optimizations reduce CPU usage during streaming by approximately 37% through text update coalescing and mitigate long-session memory growth from the terminal output cache.

github CrewAI · 13h ago

v1.14.8a4 Release Notes

v1.14.8a4 adds conversational flow support in the CLI TUI. It includes fixes for symlink path traversal during skill archive extraction and validation of declarative flow definition paths. Documentation for v1.14.8a3 is updated.

github llama.cpp · 13h ago

LLaMA.cpp Release b9784: Hexagon MM Optimizations and Cross-Platform Binaries

LLaMA.cpp releases version b9784 with major optimizations for hexagon-based MM operations, including 32x32 tiled weight repack, improved dyn.quant handling, and unified kernel parameter management. The release includes new binaries for macOS (arm64 and x64), iOS, and multiple Linux architectures with support for Vulkan, ROCm, and OpenVINO.

arxiv arXiv cs.LG · 13h ago

A Differentiable Atari VCS for Explainable AI

A fully differentiable emulator of the Atari 2600 VCS is presented, reproducing all 64 ALE games with bit-for-bit accuracy in RAM and screen output. The system enables gradient-based explainable AI by providing a complex, fully known ground truth, with both Julia and JAX implementations validated against a reference emulator and supporting high-throughput training on GPUs.

arxiv arXiv cs.LG · 13h ago

AdaR: Adaptive Recurrent Message Passing for Graph Test-Time Computing

AdaR enables flexible test-time computing on graphs without parameter changes by using adaptive recurrence. It derives step dependence as a necessary and sufficient condition for convergence and incorporates normalized step information and representation-target relations into recurrent updates, guided by gradient-based supervision signals. Empirical results show AdaR outperforms strong baselines in both inductive and transductive graph learning settings.

arxiv arXiv cs.LG · 13h ago

Speech-Text Models Latently Transcribe Speech in Intermediate Layers

Interleaved speech-language models undergo an implicit transcription phase where spoken words become decodable as text tokens in intermediate layers, despite no speech recognition training. Up to 77% of the data shows the spoken word appearing as a top candidate text prediction, followed by a transition to text-based next-word prediction before returning to speech. This behavior is influenced by interleaved training and text LM initialization, and correlates with spoken knowledge performance.

arxiv arXiv cs.LG · 14h ago

LLM-Integrated App Bug Seams Reveal Testing Gaps

A rental-search assistant with LLM features and multi-market support faced persistent user defects despite 1,553 passing automated tests. Analysis of 252 bug-fix commits showed 44% of fixes occurred at four unseen seams: browser runtime, non-default market, end-to-end flows, and whole-system level. A fix without a seam guard caused a defect to ship twice, highlighting the need for targeted testing at these boundaries.

arxiv arXiv cs.LG · 14h ago

Fed-CausalDiff: Decoupled Synchronization for Federated Do-Simulation

Fed-CausalDiff introduces a federated causal diffusion framework that enables do-simulation and policy evaluation in decentralized settings. It decomposes latent state evolution into global and local components, allowing decoupled synchronisation to reduce communication cost while maintaining accurate causal inference.

media r/LocalLLaMA · 14h ago

SDXL Running Locally in Browser on WebGPU, Open-Source

A browser extension enables local image generation using SDXL models via WebGPU, running on the user's GPU without external setups. The tool supports two models: SDXL-Lighting fp16 (7 GB) and a 4-bit version (3.6 GB), with requirements including at least 8 GB VRAM for the full model and a browser with WebGPU support (Chrome/Edge 122+ or latest Firefox).

arxiv arXiv cs.LG · 14h ago

Deep Learning Pipeline for Sign Language Recognition and Translation to Indian Vernaculars

A two-stage deep learning model classifies Indian sign language video clips into English words using a fine-tuned VideoMAE transformer, achieving 99% training and 78% validation accuracy on a 13-class dataset. The predicted English labels are translated into Hindi, Telugu, and Bengali using Meta AI's NLLB-200 multilingual model, with a Streamlit demo enabling user-uploaded video inference and cross-lingual output.

arxiv arXiv cs.LG · 14h ago

Prompt-Side Preprocessing Enhances Edge AI Accuracy

A structured prompt framework improves local LLM accuracy in environmental monitoring by transforming raw sensor data into enriched textual representations. Evaluations on indoor and outdoor datasets show local model accuracy increases from 50.9% to 81.7% indoors and from 63.7% to 89.3% outdoors with enriched prompts, while maintaining low latency near 0.22 seconds in no-chain-of-thought mode.