All articles — korshunov.ai

All articles Page 1 / 119

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

The authors introduce Themis, an XAI-enabled testing and evaluation framework that combines transparency through explainability with alignment via human feedback for safe Reinforcement Learning systems.

arxiv arXiv cs.AI · 6h ago

Privacy-Preserving RAG via Multi-Agent Semantic Rewriting

The authors propose a multi-agent framework that sanitizes retrieved content in Retrieval-Augmented Generation (RAG) systems through semantic rewriting to prevent privacy leakage from malicious prompts. By employing three specialized agents for privacy extraction, semantic analysis, and reconstruction, the approach removes sensitive identifiers while preserving the core meaning of the text.

arxiv arXiv cs.AI · 6h ago

SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation

The article introduces SAFARI, a framework designed to diagnose failures in autonomous agents by replacing linear context loading with a tool-augmented diagnostic loop. This approach decouples diagnostic accuracy from architectural context limits by using specialized tools and short-term memory to analyze trajectory segments.

arxiv arXiv cs.AI · 6h ago

Visualizing 'We the People': Bridging the Perception Gap through Pluralistic Data Storytelling

This article examines how intentional, pluralistic design choices in AI-enabled digital platforms can produce visualizations that emphasize nuance and intergroup commonalities, thereby reducing political polarization. It highlights a specific deliberative technology initiative that maps high-dimensional opinion spaces to reveal areas of both consensus and dissensus among diverse populations.

media r/LocalLLaMA · 6h ago

Mellum2 local deployments

JetBrains has open-sourced the Mellum2 models, a series of 12B-2.5A LLMs trained from scratch to target fast inference on H100/H200 hardware as well as local deployments.

arxiv arXiv cs.AI · 7h ago

CineCap: Structured Reasoning with Spatio-Temporal Anchors for Cinematographic Video Captioning

Researchers propose CineCap, a framework that combines structured reasoning with spatio-temporal anchors and reinforcement learning to improve cinematographic video captioning. The method grounds professional film-language descriptions in explicit visual evidence while balancing descriptive completeness and factual correctness.

media AI News (smol.ai) · 7h ago

Anthropic launches Claude Tag, a Slack-native async delegation tool

Anthropic has launched Claude Tag, a new workflow feature that allows teams to delegate work to Claude asynchronously within Slack. Positioned as a shift from one-user chat to teamwide collaboration, the tool enables Claude to join as a team member with access to selected channels, tools, and codebases.

lab NVIDIA Technical Blog · 7h ago

Maximize AI Factory Energy Efficiency Through Full-Stack Inference and Training Optimizations

Power consumption represents 40% of the operating expenses for running an AI factory, with performance per watt becoming a critical efficiency metric that directly impacts token costs.

media r/LocalLLaMA · 7h ago

Building a web access layer for local AI agents

A developer shares their experience of creating a centralized web access layer to manage interactions between local AI models and external services. This approach addresses the maintenance burden of building individual integrations for every new agent project.

media r/LocalLLaMA · 7h ago

NASA tests local LLM inference for future space missions

Red Hat and NASA researchers are developing the Crew Medical Officer Digital Assistant (CMO-DA), a medical AI system that runs large language models on local hardware with zero cloud dependency. This initiative addresses the impracticality of Earth-based telehealth for astronauts on Moon or Mars missions due to light delay and communication blackouts.

media r/LocalLLaMA · 7h ago

Setup an H200 NVL on consumer(ish) hardware

A user successfully configured an NVIDIA H200 NVL GPU on a workstation built with ASUS WRX90E-SAGE SE motherboard and a 64-core Threadripper processor, demonstrating that high-end AI accelerators can run on non-server hardware.

media r/LocalLLaMA · 7h ago

CPU-only GLM 5.2: Epyc and 512GB RAM

A user tested the 4-bit version of GLM-5.2 (GLM-5.2-UD-Q4_K_XL) on a server equipped with an Epyc Rome 7452 processor and 512GB of RAM. The model was evaluated using a complex coding prompt requiring the creation of a self-contained 3D arena game in HTML, CSS, and JavaScript.

media Hugging Face Forums · 7h ago

We all start somewhere

A developer with over 25 years of experience in web technologies is transitioning into AI engineering to move beyond using tools and understand how to build with them.

media Hugging Face Forums · 7h ago

User unable to restart private Hugging Face Space due to 503 error

A user reports that their private Hugging Face Space, specifically 'Ark-kun/tangent', stopped working abruptly and cannot be restarted. Attempts to restart or perform a factory rebuild both fail with a "503. Something went wrong when restarting this Space" error.

lab NVIDIA Technical Blog · 8h ago

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

NVIDIA introduces DFlash speculative decoding to significantly boost inference performance on its Blackwell architecture, addressing the latency challenges inherent in autoregressive LLMs.

lab NVIDIA Technical Blog · 8h ago

Build an AI Scientist for Life Science Discovery with NVIDIA BioNeMo Agent Toolkit

NVIDIA introduces the BioNeMo Agent Toolkit to facilitate the creation of AI scientists capable of reading papers, writing code, and generating hypotheses for life science discovery.

lab NVIDIA Technical Blog · 8h ago

How Telcos Build Autonomous Networks with Agentic AI

Telecom operators are adopting AI across network operations, customer care, and back-office workflows, but most remain early in their journey toward full autonomy. Current automation efforts typically operate at Level 2–3 of TM Forum’s taxonomy, focusing on streamlining predefined solutions within selective domains.

media Latent Space · 8h ago

SpaceX Neocloud Revenue Hits $28B/Year Amidst OpenAI and Sakana Updates

SpaceX has secured its third GPU rental deal with Reflection AI, bringing its annualized revenue to approximately $28 billion based on a calculated rate of over $10 per hour for Blackwell GPUs. This valuation is roughly twice that of Coreweave, highlighting the rapid growth and high pricing power in the AI infrastructure market.

media r/LocalLLaMA · 8h ago

Kimi and GLM on frontier code

This Reddit post by user Charuru shares an image titled "Kimi and GLM on frontier code." The content serves as a visual reference or discussion starter regarding the performance of Kimi and GLM models in coding tasks.

media Hugging Face Forums · 8h ago

Ainara: Local-first AI assistant with persistent memory and LLM switching

Ainara is a local-first desktop application for Dublin-based developer that functions as an AI companion with persistent memory across sessions. It allows users to switch between cloud models like Grok, Claude, and Gemini, or local Ollama models, while maintaining context seamlessly.