Reasoning models — korshunov.ai

Reasoning models Page 9 / 35

GPT-5.5 Instant Enhances ChatGPT's Health Responses

GPT-5.5 Instant improves ChatGPT's health and wellness responses through stronger reasoning, better context handling, clearer communication, and physician-informed evaluations.

media r/LocalLLaMA · 4d ago

AllenAI releases MolmoMotion vision models for future motion prediction

AllenAI has released two MolmoMotion models that predict 3D point trajectories based on short video histories and natural-language instructions. One model uses a three-frame history, the other a one-frame history, enabling future motion forecasting for objects in 3D space.

media r/LocalLLaMA · 5d ago

Free 15-Part Series on LLM Internals Grounded in Gemma 4 12B

I wrote a free 15-part series detailing LLM internals, using Gemma 4 12B as the core example. Each part covers technical aspects from tokenization to serving, with real math, tensor shapes, and hardware constraints. The series includes a companion vLLM Deep Dive and is fully accessible without paywalls or email.

media r/LocalLLaMA · 5d ago

Gemma 4 26b a4b excels in language and scientific queries

A user claims Gemma 4 26b a4b is the best model they've tried for language learning and scientific queries, outperforming Qwen 3.5/3.6 in these domains. The post highlights a gap in available small MOE models between 20b and 30b, suggesting a need for more options beyond coding and agentic tasks.

media r/LocalLLaMA · 5d ago

Any opinion about Qwen3.6-27B@BF16 vs Step3.7@IQ4_XS?

The user asks which model—Qwen3.6-27B at BF16 precision or Step3.7 with IQ4_XS quantization—would make saner, more autonomous decisions with less need for human guidance. The query compares a dense, high-precision model with a larger, lower-precision MoE model, noting trade-offs in memory and performance.

media r/LocalLLaMA · 5d ago

Research Project: Injecting Natural-Language Tactical Intent into Multi-Agent Football Policies

A research project explores using natural-language tactical instructions from humans to guide autonomous AI agents in a football simulation. The system enables human coaches to issue high-level directives like 'press aggressively' or 'exploit the left side', which the AI agents then adapt to in real time within a dynamic, team-based environment.

media r/LocalLLaMA · 5d ago

Best local LLM for English story summarization

A user asks which local LLM currently performs best at summarizing long English stories. The query highlights the need for accurate, local LLMs capable of handling multi-page narratives in English.

media r/LocalLLaMA · 5d ago

GLM 5.2 Achieves 98% Max Intelligence with Less Than Half Tokens

GLM 5.2 demonstrates 98% of maximum intelligence in coding tasks using less than half of its total token budget, according to a technical report by z_ai. The model's reasoning efficiency has improved significantly, with token usage increasing from 16.7k to 36.7k between GLM 5.1 and GLM 5.2, though high-level settings may strain local hardware performance.

media r/LocalLLaMA · 5d ago

Local AI for Local Office Files

A Reddit user asks which AI agent is best for handling local office files like Excel, PDF, Word, and JSON. The post seeks user experiences and implemented workflows for such tasks.

media r/LocalLLaMA · 5d ago

What is the best book for learning ML/Deep Learning maths?

A user asks for book recommendations to build a strong mathematical foundation for understanding and contributing to machine learning and deep learning, especially given their interest in AI architectures and large language models. They acknowledge that intuitive understanding is limited without proper mathematical background and seek structured resources to complement their current learning through channels like 3b1b.

media r/LocalLLaMA · 5d ago

Attention Algebra — a grammar that translates natural language into spectrograms

Attention Algebra is a prototype that translates natural language into algebraic expressions, maps them to mathematical dynamics, and visualizes the result as a spectrogram. It treats language as a lossy projection of high-dimensional states, proposing that raw attention patterns grouped into functions serve as the 'DNA' of text, enabling efficient reasoning chains by reducing token usage from 20k to 4k.

media r/LocalLLaMA · 5d ago

$1800 GPU cost runs Qwen3.6-27B with 262K context and 55 tok/s

A setup using four 5060 Ti GPUs (totaling $1800) achieves 55 tokens per second with Qwen3.6-27B-FP8, supporting 262K context length and bfloat16 KV cache. The configuration uses P2P and FlashInfer, with benchmark results showing 55.67 output token throughput and 65.25% speculative decoding acceptance rate.

media Don't Worry About the Vase · 6d ago

Claude Fable 5 and Mythos 5: Capabilities

Anthropic launched Claude Fable 5, a Mythos-class model claiming state-of-the-art performance across software engineering, scientific research, and knowledge work. It was quickly taken down by the U.S. government after a jailbreak was reported, though Anthropic asserts it is now available again, with Fable 5 showing exceptional capabilities and a more nuanced, thoughtful reasoning style compared to prior models.

media r/LocalLLaMA · 6d ago

What's more impressive, GLM 5.1 to 5.2 or Qwen 3.5 to 3.6?

A Reddit post compares the performance improvements of GLM 5.1 to 5.2 and Qwen 3.5 to 3.6. The post notes that mentioning 'Döner' activates GLM 5.2's German-specific weights, while Qwen 3.6 is evaluated with 35B parameters using Unsloth Q8 K XL quantization via llama.cpp.

media r/LocalLLaMA · 6d ago

Watching a Local AI Voice Assistant Get Dumber

A test on an RTX 5060 Ti showed that reducing a local AI voice assistant's model size from 9B to 0.8B leads to a sharp decline in capability. The 9B model handles tool orchestration well, while smaller models show increasing failures: the 4B model skips tool calls and guesses facts, the 2B model suffers semantic drift, and the 0.8B model fails to operate agent functions, triggering wrong APIs or infinite loops.

media r/LocalLLaMA · 6d ago

Has anyone used VibeThinker-3B outside benchmarks?

A Reddit user asks about real-world performance of VibeThinker-3B beyond benchmark scores, focusing on debugging, coding, reasoning, latency, and usability. The model is available on Hugging Face and described in a paper on arXiv.

arxiv arXiv cs.AI · 6d ago

DataMagic Turns Tabular Data into Interactive Insight Videos

DataMagic transforms raw tabular data and natural language queries into narrative data-insight videos. It uses DVSpec to ensure data fidelity by linking visual elements to data fields via semantic references, and employs a multi-agent architecture to generate and orchestrate coherent video scenes. The system supports interactive exploration and provenance-based data Q&A, enabling users to engage with data beyond static views.

arxiv arXiv cs.AI · 6d ago

Multi-View Decompilation Improves LLM-Based Malware Classification

A benchmark of benign and malicious binaries compiled and decompiled with Ghidra and RetDec reveals that providing both decompiler views to large language models improves malicious-class F1, primarily by increasing recall. Analysis shows Ghidra and RetDec make distinct errors, indicating their outputs offer complementary evidence for malware classification.

arxiv arXiv cs.AI · 6d ago

Attention-Guided Deep Learning for Interpretable Sperm Morphology Classification

A new deep learning framework combines EfficientNet-B0 with CBAM to improve accuracy and interpretability in sperm morphology classification. Evaluated on SMIDS and HuSHem datasets, it achieves 90.2% and 93.9% accuracy with macro F1 scores of 0.913 and 0.948, outperforming baseline models. Grad-CAM++ visualizations enable transparent feature analysis, supporting clinical adoption in fertility clinics.

arxiv arXiv cs.AI · 6d ago

Optimal Order in Multi-Agent Systems Framework

A new framework analyzes multi-agent systems by modeling agent influence and response functions. It derives macroscopic properties like power, entropy, and order, and identifies an optimal level of synchronization that balances productivity, stability, and adaptability. The study shows that order and system properties are task-dependent and context-relative.