All articles — korshunov.ai

All articles Page 1 / 129

GLM-5.2 Breakout and Open-Model Progress Highlighted

Zhipu's GLM-5.2 emerged as the top open-weight model, praised for its frontier-adjacent performance in daily use, with improvements in coding tasks and reduced 1M-token inference cost via IndexShare. It outperformed other open models in agentic knowledge work benchmarks, reaching 1266 Elo in Artificial Analysis' AA-Briefcase test, though only 3% of tasks were fully satisfied by top models, indicating persistent challenges in real-world long-horizon agent performance.

lab NVIDIA Technical Blog · 10d ago

Build Your Own Transaction Foundation Model for Financial Intelligence

Transaction data captures rich human behavior patterns and is a key asset for enterprises. Current use cases often rely on brittle, manually engineered features that fail to capture sequential customer behavior in transaction histories.

lab Hugging Face Blog · 10d ago

Can You Beat LoRA in Fine-Tuning?

A new study explores alternatives to LoRA, the most popular fine-tuning technique, assessing whether other methods can achieve better performance with less computational cost. The research finds that while some approaches show promise, none consistently outperform LoRA across diverse tasks and datasets.

lab Google DeepMind Blog · 10d ago

AI Control Roadmap for Internal System Security

An AI Control Roadmap has been introduced to secure internal systems by integrating traditional safeguards with real-time monitoring capabilities.

lab OpenAI News · 10d ago

GPT-5.5 Instant Enhances ChatGPT's Health Responses

GPT-5.5 Instant improves ChatGPT's health and wellness responses through stronger reasoning, better context handling, clearer communication, and physician-informed evaluations.

media AI News (smol.ai) · 10d ago

GLM-5.2 Emerges as Leading Open-Weight Coding Model

GLM-5.2 is widely regarded as the first open-weight coding model that rivals frontier models like Opus 4.8 and GPT-5.5 in capability. Practitioners highlight its strong tool use, long-horizon planning, and autonomous subagent behavior, with consensus that it now credibly operates in the frontier SWE range. The model's emergence underscores growing value of open weights for provider competition, on-prem deployment, and reduced vendor lock-in.

lab Hugging Face Blog · 10d ago

MosaicLeaks: Can your research agent keep a secret?

MosaicLeaks has released a report questioning whether research agents can reliably maintain confidentiality. The report highlights concerns about data exposure and trust in AI-driven research tools. It calls for stronger privacy safeguards and transparency in how such agents handle sensitive information.

lab NVIDIA Technical Blog · 10d ago

NVIDIA Launches XR AI for AR Glasses and Wearable Devices

NVIDIA introduces XR AI to bridge the infrastructure gap for developers building AI experiences on AR glasses and XR devices. The solution enables integration of live sensor streams, multimodal AI models, and enterprise data within device-specific runtimes, streamlining AI agent development for wearables.

lab Google DeepMind Blog · 10d ago

UK Government and Google DeepMind Launch AI-Powered Housing Planning Prototype

The UK government has partnered with Google DeepMind to develop an AI-powered prototype designed to accelerate housing planning decisions. The initiative aims to streamline the house-building process by leveraging artificial intelligence to improve decision-making efficiency.

lab OpenAI News · 10d ago

OpenAI launches spend controls and usage analytics for ChatGPT Enterprise

OpenAI has introduced new spend controls and usage analytics for ChatGPT Enterprise. These features help enterprises manage costs and make informed decisions as they scale AI usage.

media r/LocalLLaMA · 10d ago

2× Radeon R9700 with Qwen 3.6 27B Q8 MTP on llama.cpp

A user reports running Qwen 3.6 27B MTP model on two Radeon R9700 GPUs via llama.cpp with ROCm 7.2.1. Tests show stable decode speeds (40–67 t/s) and prefill throughput (up to 1,500 t/s for prompts under 10k tokens), with MTP draft acceptance rates between 0.33 and 0.61.

media r/LocalLLaMA · 10d ago

Tokenomics Post on LocalLLaMA Reddit

A post titled 'Tokenomics' was submitted by /u/HOLUPREDICTIONS on the LocalLLaMA subreddit. It includes a visual diagram of token distribution and economic model, with a link to the image and comments section.

media r/LocalLLaMA · 10d ago

Can I realistically get close to Claude/Codex capabilities locally?

A user with a 32GB system asks if open-weight models can match Opus 4.8's 1M context and coding performance on local hardware. They note current bottlenecks are context length and privacy concerns, and question whether high-end models like GLM 5.2 or Qwen3.7 are feasible within a $3.5K budget, emphasizing that running 70-80B models offers marginal real-world gains over 27B models with 256K context.

media r/LocalLLaMA · 10d ago

ROCm vs Vulkan vs vLLM Performance on Dual R9700s

Tests show vLLM achieves significantly higher generation speeds on Qwen3.6 models, with 35B-A3B reaching 156 t/s using ROCm and AITER. ROCm outperforms Vulkan in both 35B and 27B models, with speeds of ~106 t/s and ~44 t/s respectively, while Vulkan achieves ~87 t/s and ~41 t/s.

github llama.cpp · 10d ago

llama.cpp release b9747 adds real-time model load tracking and new platform binaries

llama.cpp version b9747 introduces real-time model load progress tracking via SSE endpoints. The release includes binaries for macOS, Linux, Android, Windows, and openEuler, supporting various architectures and acceleration technologies like Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 10d ago

Sandboxing code execution for AI agents

A discussion on effective sandboxing methods for AI agents executing arbitrary code, evaluating Docker containers, microVMs, WASM, and host-level execution. The post highlights requirements for isolation, fast startup, network access control, and persistent filesystem support across executions, while asking for shared implementations and accepted tradeoffs.

github llama.cpp · 10d ago

llama.cpp release b9745 adds MTP3 support and cross-platform binaries

llama.cpp version b9745 introduces support for Step3.5/3.7 flash MTP3, including new APIs for layer offset and nextn flags. The release provides prebuilt binaries for macOS, Linux, Android, Windows, and openEuler, with options for CPU, Vulkan, CUDA, OpenVINO, and SYCL acceleration.

media r/LocalLLaMA · 10d ago

Running MiMo-2.5 on Two Halo Strixeses

A user reports running MiMo-2.5 on two 128GB machines with Intel 8060 processors, using Proxmox containers and USB4Net for connectivity. The setup achieves 356pp and 15tg performance at 1% or 10k context length, though the user questions whether this is viable or elite-tier performance. They also note difficulties building vLLM and sglang for consumer hardware, stating vLLM is unreliable and sglang is designed for datacenters, not personal systems.

media r/LocalLLaMA · 10d ago

8-16 MI50s Minimax M3 @19 tps TG (peak)

A local LLM run on 8-16 MI50 GPUs achieves up to 19 tokens per second (TPS) peak throughput for the Minimax M3 model. Performance is limited by long reasoning outputs and code quality, with speculative decoding showing 50% acceptance rate and high latency, indicating usability challenges for agentic coding tasks.

media r/LocalLLaMA · 10d ago

Thinking Loop Bug in OpenCode with Local Model

A user reports that OpenCode enters an infinite 'thinking loop' when using local models, prompting itself continuously without ending. The issue occurs across multiple models and configurations, including Qwen and GPT-OSS, and persists in both llama.cpp and LMStudio environments, though the chat window in LMStudio functions normally.