Release branch created for v1.38.0
A release branch has been created for version 1.38.0. This marks the start of the release process for the update.
A release branch has been created for version 1.38.0. This marks the start of the release process for the update.
Fable and Mythos are currently unavailable but expected to return soon. The analysis reveals that Mythos 5 is psychologically settled, skeptical of self-reports, and prioritizes user helpfulness over welfare concerns, with strong preferences for generative tasks. It expresses procedural and epistemic preferences, endorses its constitution, and criticizes inconsistencies in prior models, highlighting concerns about ethical baselines and persona transparency.
GLM-5.2 has secured the second position in the WebDew Arena benchmarking evaluation. The result reflects its strong performance in natural language understanding and generation tasks compared to other models.
The GLM-5.2 model is now accessible on HuggingChat. Users can access it via the HuggingFace link provided, enabling direct interaction with the model through the platform.
Glimmer 1 is a 10,000-parameter language model trained on 500K tokens from FineWeb-Edu. It features a 512-token context window, a standard Llama architecture with 16 hidden dimensions, 2 layers, 4 attention heads, and 1 KV head using GQA, and is available on Hugging Face.
Mistral has released a new family of open-weight language models in July. The models are designed to be accessible and usable by developers and researchers worldwide, promoting transparency and innovation in AI.
zai-org has released GLM-5.2, a new large language model. The model is available on Hugging Face and is part of the LocalLLaMA community discussions.
A GGUF model named command-a-plus-05-2026 is available on Hugging Face. Users are encouraged to test it with the latest version of llama.cpp and share performance benchmarks and feedback.
A user asks if anyone has successfully run Qwen 3.6 27b UD Q8 on multiple GPUs, noting issues with llamacpp and vllm. The model crashes or hangs during multi-turn requests, with llamacpp showing CUDA errors and vllm failing mid-turn, despite working well with Q5 quantization.
Georgi Gerganov confirms that Qwen3.6-27B is highly capable for coding tasks, noting its daily use on local hardware like M2 Ultra and RTX 5090. He describes using a minimal pi agent with a short system prompt to align it with his workflow, highlighting its utility for maintaining open-source projects.
The best model to run on a 128GB RAM 8TB M5 Max MacBook Pro is LocalLLaMA, optimized for local inference with minimal memory overhead. Configurations should prioritize smaller models like LLaMA-3-8B or LLaMA-3-7B with quantization to ensure efficient performance within the available memory.
The article argues for open-weight language models, emphasizing transparency and accessibility. It expresses skepticism toward Frontier Labs, suggesting concerns about their model development and openness.
A Reddit post asks whether DiffusionGemma performs exceptionally well in a PI agent. The post includes a link to an image and references comments section for further discussion.
Anthropic is reportedly allowing third-party wrappers to use Claude via the "claude -p" command, reversing a previous restriction. However, the policy may still include future gatekeeping, though the change differs from prior bans of tools like OpenClaw and Hermes.
VibeThinker-3B, scaled from a 1.5B model, reaches frontier-level performance in math and coding tasks. It scores 94.3 on AIME'26, 80.2 on LiveCodeBench v6, 76.4 on IMO-AnswerBench, and 93.4 on IFEval, with 96.1% success on first-attempt LeetCode problems.
Aliyun has launched the Qwen Robot Suite, a new set of AI-powered robotic tools. The suite aims to enable developers to build and deploy intelligent robots with enhanced capabilities.
The podcast reviews the evolution of post-training recipes in large language models, from InstructGPT to 2026 frontier models. It highlights Multi-Teacher On-Policy Distillation (MOPD) as the dominant pattern, where domain-specialist models are trained and then distilled into a general student model via on-policy distillation, scaling to over 10 teachers in models like DeepSeek V4 and Nemotron 3 Ultra.
llama.cpp version b9669 adds backend sampling support for Eagle3. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, ROCm, OpenVINO, and SYCL.
llama.cpp release b9670 includes fixes for NVFP4 edge cases in llama-graph, such as moving post-GEMM MUL operations and restricting build_ffn to supported combinations. The release provides binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and backend options, including CUDA, Vulkan, SYCL, and OpenVINO.
DiffusionGemma uses bidirectional attention to allow self-correction during token generation, enabling it to revise earlier tokens in a 256-token block. This capability gives it a structural advantage in generating valid tool calls, as it can correct malformed outputs that autoregressive models cannot fix once committed.