All articles — korshunov.ai

All articles Page 1 / 120

Mellum2 local deployments

JetBrains has open-sourced the Mellum2 models, a series of 12B-2.5A LLMs trained from scratch to target fast inference on H100/H200 hardware as well as local deployments.

arxiv arXiv cs.AI · 8h ago

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

Researchers propose CineCap, a framework that combines structured reasoning with spatio-temporal anchors and reinforcement learning to improve cinematographic video captioning. The method grounds professional film-language descriptions in explicit visual evidence while balancing descriptive completeness and factual correctness.

media AI News (smol.ai) · 8h ago

Anthropic launches Claude Tag, a Slack-native async delegation tool

Anthropic has launched Claude Tag, a new workflow feature that allows teams to delegate work to Claude asynchronously within Slack. Positioned as a shift from one-user chat to teamwide collaboration, the tool enables Claude to join as a team member with access to selected channels, tools, and codebases.

lab NVIDIA Technical Blog · 8h ago

Maximize AI Factory Energy Efficiency Through Full-Stack Inference and Training Optimizations

Power consumption represents 40% of the operating expenses for running an AI factory, with performance per watt becoming a critical efficiency metric that directly impacts token costs.

media r/LocalLLaMA · 8h ago

Building a web access layer for local AI agents

A developer shares their experience of creating a centralized web access layer to manage interactions between local AI models and external services. This approach addresses the maintenance burden of building individual integrations for every new agent project.

media r/LocalLLaMA · 8h ago

NASA tests local LLM inference for future space missions

Red Hat and NASA researchers are developing the Crew Medical Officer Digital Assistant (CMO-DA), a medical AI system that runs large language models on local hardware with zero cloud dependency. This initiative addresses the impracticality of Earth-based telehealth for astronauts on Moon or Mars missions due to light delay and communication blackouts.

media r/LocalLLaMA · 8h ago

Setup an H200 NVL on consumer(ish) hardware

A user successfully configured an NVIDIA H200 NVL GPU on a workstation built with ASUS WRX90E-SAGE SE motherboard and a 64-core Threadripper processor, demonstrating that high-end AI accelerators can run on non-server hardware.

media r/LocalLLaMA · 8h ago

CPU-only GLM 5.2: Epyc and 512GB RAM

A user tested the 4-bit version of GLM-5.2 (GLM-5.2-UD-Q4_K_XL) on a server equipped with an Epyc Rome 7452 processor and 512GB of RAM. The model was evaluated using a complex coding prompt requiring the creation of a self-contained 3D arena game in HTML, CSS, and JavaScript.

media Hugging Face Forums · 8h ago

We all start somewhere

A developer with over 25 years of experience in web technologies is transitioning into AI engineering to move beyond using tools and understand how to build with them.

media Hugging Face Forums · 8h ago

User unable to restart private Hugging Face Space due to 503 error

A user reports that their private Hugging Face Space, specifically 'Ark-kun/tangent', stopped working abruptly and cannot be restarted. Attempts to restart or perform a factory rebuild both fail with a "503. Something went wrong when restarting this Space" error.

lab NVIDIA Technical Blog · 9h ago

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

NVIDIA introduces DFlash speculative decoding to significantly boost inference performance on its Blackwell architecture, addressing the latency challenges inherent in autoregressive LLMs.

lab NVIDIA Technical Blog · 9h ago

Build an AI Scientist for Life Science Discovery with NVIDIA BioNeMo Agent Toolkit

NVIDIA introduces the BioNeMo Agent Toolkit to facilitate the creation of AI scientists capable of reading papers, writing code, and generating hypotheses for life science discovery.

lab NVIDIA Technical Blog · 9h ago

How Telcos Build Autonomous Networks with Agentic AI

Telecom operators are adopting AI across network operations, customer care, and back-office workflows, but most remain early in their journey toward full autonomy. Current automation efforts typically operate at Level 2–3 of TM Forum’s taxonomy, focusing on streamlining predefined solutions within selective domains.

media Latent Space · 9h ago

SpaceX Neocloud Revenue Hits $28B/Year Amidst OpenAI and Sakana Updates

SpaceX has secured its third GPU rental deal with Reflection AI, bringing its annualized revenue to approximately $28 billion based on a calculated rate of over $10 per hour for Blackwell GPUs. This valuation is roughly twice that of Coreweave, highlighting the rapid growth and high pricing power in the AI infrastructure market.

media r/LocalLLaMA · 9h ago

Kimi and GLM on frontier code

This Reddit post by user Charuru shares an image titled "Kimi and GLM on frontier code." The content serves as a visual reference or discussion starter regarding the performance of Kimi and GLM models in coding tasks.

media Hugging Face Forums · 9h ago

Ainara: Local-first AI assistant with persistent memory and LLM switching

Ainara is a local-first desktop application for Dublin-based developer that functions as an AI companion with persistent memory across sessions. It allows users to switch between cloud models like Grok, Claude, and Gemini, or local Ollama models, while maintaining context seamlessly.

media Hugging Face Forums · 9h ago

Practical experience with ML surrogates for CFD and FEA simulations?

An engineering simulation professional seeks real-world deployment experiences of machine learning surrogates to reduce the cost of expensive Computational Fluid Dynamics (CFD) and Finite Element Analysis (FEA) solver runs.

lab Meta AI / FAIR Blog · 9h ago

Brain2Qwerty v2 Achieves 61% Word Accuracy in Non-Invasive Brain-to-Text Decoding

Researchers have released Brain2Qwerty v2, a non-invasive AI pipeline that decodes real-time sentences from magnetoencephalography (MEG) recordings without surgical implants. The system achieves a 61% word accuracy rate overall and up to 78% for top performers, significantly outperforming previous non-invasive methods.

media AI News (smol.ai) · 10h ago

OpenAI expands Daybreak, Sakana releases Fugu, GLM-5.2 gains traction

This week's AI news highlights OpenAI's expansion of its cybersecurity initiatives, Sakana AI's release of an orchestration model called Fugu, and the growing adoption of the open-weight GLM-5.2 model.

arxiv arXiv cs.LG · 10h ago

Leveraging Similarities in Multi-Armed Bandits

This study investigates online learning with similarity-structured action sets encoded by rooted trees, demonstrating that standard one-point feedback cannot exploit these similarities. The authors propose unified algorithms for richer feedback models that replace the number of actions with a similarity-aware effective count to improve regret bounds.