All articles — korshunov.ai

All articles Page 1 / 89

Deep material network for homogenization of piezoelectric composites

A piezoelectric deep material network (PDMN) is proposed to efficiently homogenize two-phase piezoelectric composites. The framework embeds electromechanical homogenization relations into its architecture, enabling physics-informed, semi-analytical predictions with over three orders of magnitude lower computational cost than direct numerical simulation, validated on PVDF-LiNbO3 and viscoelastic-piezoelectric composites under nonlinear loading.

arxiv arXiv cs.LG · 11h ago

Concept-Constrained Prompt Learning for Few-Shot CLIP Adaptation

CCPL introduces a lightweight framework that anchors class prompts to frozen concept prototypes, improving few-shot CLIP adaptation. It achieves better base-to-new performance on DTD and EuroSAT compared to CoOp, with consistent gains from text-space concept regularization, though results vary by dataset and protocol.

arxiv arXiv cs.LG · 11h ago

Stationary Robust Mean-Field Games under Model Mismatches

This paper introduces a stationary mean-field game framework that directly incorporates distributional model uncertainty into population-coupled dynamics. It establishes a robust dynamic programming principle, proves existence of a stationary robust equilibrium, and presents the first algorithm with convergence guarantees. The mean-field solution approximates finite-population equilibria and provides explicit non-asymptotic error bounds under model uncertainty.

arxiv arXiv cs.LG · 11h ago

Training-Free Task Classification for Multi-Task Model Merging

SiM enables dynamic routing in multi-task model merging without additional training or task ID access. It uses SVD-based manifold approximations and projects test inputs onto precomputed task manifolds to route inputs to relevant experts, improving performance and reducing the gap to individual expert levels.

arxiv arXiv cs.LG · 11h ago

Importance-Weighted On-Policy Distillation Addresses Position Bias

On-Policy Distillation (OPD) suffers from position bias where later tokens provide poor supervision. We introduce Importance-Weighted On-Policy Distillation (IW-OPD), which assigns weights based on distribution discrepancy, prioritizing early tokens. IW-OPD converges faster and achieves up to 6.9 point performance gains on AIME-2025.

arxiv arXiv cs.LG · 11h ago

Scalable Bayesian Models for Stellar Flare Detection

A generative surrogate framework using a Variational Autoencoder approximates Gaussian Process priors, bypassing costly covariance operations. The VAE+Hidden Markov Model architecture enables fast, scalable stellar flare detection in large astronomical time series, matching exact models in structural fidelity while reducing computational time significantly.

arxiv arXiv cs.LG · 11h ago

Small Language Models Outperform Frontier LLMs in Relation Extraction

A fine-tuned 0.5B-parameter Qwen2.5 model achieves 0.83 micro-F1 in general-domain relation extraction, surpassing zero-shot GPT-5.4 and Claude Sonnet 4.6. On literary benchmarks, it reaches 0.92 on the Biographical dataset, outperforming GPT-5.4 and exceeding frontier models in accuracy, demonstrating that task-adapted small models can deliver high performance with minimal hardware and privacy overhead.

media r/LocalLLaMA · 11h ago

I reverse engineered Windows Copilot into a free OpenAI-compatible API

A user has created a local API that replicates OpenAI-compatible GPT-4 functionality using Microsoft's free Copilot service. The tool logs into a Microsoft account once, runs locally on a Windows device, and exposes a server at http://localhost:8000/v1 that supports streaming and multi-turn conversations without requiring an API key or billing. It is designed for personal and educational use, and available via GitHub at https://github.com/sums001/Windows-Copilot-API.

blog Simon Willison · 11h ago

Tom MacWright on Accidental Anonymity in Job Applications

Tom MacWright observes that job applications increasingly feature LLM-generated content, including portfolios and GitHub projects with fabricated commit messages. He notes that such applications reveal little about the applicants, as they lack personal authenticity and genuine self-expression.

arxiv arXiv cs.AI · 12h ago

Geometry-Aware Online Scheduling for LLM Serving

A new scheduling algorithm, Smallest Volume First (SVF), reduces LLM inference latency by optimizing key-value cache management. Theoretical analysis shows a worst-case competitive ratio reduced from 48 to 5, with 1-bit SVF achieving strong performance using minimal information. Evaluations on Llama-3.1 models confirm improvements in both average and tail latency, with the approach integrated into vLLM.

arxiv arXiv cs.AI · 12h ago

BabelJudge: Measuring LLM-as-a-Judge Reliability Across Languages and Agent Trajectories

BabelJudge introduces an open-source framework to measure four key bias modes in LLM judges across languages and agent trajectories. It reveals a significant reliability drop from Hindi to Swahili—0.714 to 0's 0.550—highlighting cross-lingual degradation invisible to raw accuracy. The framework enables bias-aware evaluation without human labels, using controlled perturbations to create known gold labels, and extends to agentic workflows with new metrics on tool accuracy and hallucination detection.

arxiv arXiv cs.AI · 12h ago

Hypothesis-Driven Skill Optimization for LLM Agents

HDSO enables safe, auditable skill updates for LLM agents without training, using falsifiable hypotheses and validation. On ALFWorld, it improves Qwen3-8B by +6.9 Avg. SR points and maintains a +7.1-point gain under noisy feedback, with validated skills transferable across runs and models when diagnostic alignment is achieved.

arxiv arXiv cs.AI · 12h ago

RoboMME-Interference: Benchmarking Robot Memory Under Interference

RoboMME-Interference introduces a cross-session benchmark to evaluate robot memory under interference. It adds unrelated sessions to prior demonstrations, revealing that perceptual memory variants degrade significantly as distractions increase, highlighting current systems' lack of robustness to interference and the need for long-context memory.

github llama.cpp · 12h ago

llama.cpp releases b9782 with new binaries and support

llama.cpp releases version b9782, including binaries for macOS, Linux, Android, Windows, and openEuler. The release adds support for Vulkan, OpenVINO, SYCL, ROCm, and CUDA across multiple architectures, with updated UI and disabled features such as KleidiAI and openEuler support.

lab Google DeepMind Blog · 12h ago

Gemini 3.5 Flash Adds Computer Use Capability

Google has introduced computer use in Gemini 3.5 Flash, enabling the model to execute code and interact with external tools. This feature allows users to run programming tasks and access real-time information through integrated computing functions.

arxiv arXiv cs.AI · 12h ago

Flow Annealing Posterior Sampling for Function-Space Regression and Inverse Problems

FAPS is the first function-space posterior sampling framework that unifies stochastic-process regression and PDE inverse problems. It uses pretrained flow-matching priors and Langevin correction with low-rank covariance preconditioning to enable efficient, accurate posterior inference from sparse, noisy data with coherent uncertainty quantification.

media r/LocalLLaMA · 12h ago

Has anyone else found vLLM outputs worse than llama.cpp?

A user reports noticing less reliable outputs from vLLM compared to llama.cpp, including formatting errors, context forgetting, and lower code quality. They ask whether such differences stem from quantization, chat templates, parser issues, or configuration errors, and seek confirmation if others have observed similar quality discrepancies between inference backends.

media r/LocalLLaMA · 12h ago

Sipp: Open-source library for in-browser inference built on llama.cpp

Sipp is an open-source library that enables in-browser inference using llama.cpp. It allows users to run local language model inference directly in web browsers without relying on cloud services. The project is available on GitHub at https://github.com/noumena-labs/Sipp.

arxiv arXiv cs.AI · 12h ago

Select-to-Act: Hierarchical RL with Adaptive Language Guidance

HRLLI introduces a hierarchical reinforcement learning framework that adapts natural-language instructions dynamically during decision-making. It decomposes instructions into stage-specific guidance elements and uses a select-to-act paradigm to enable real-time selection of relevant instruction pieces, improving sample efficiency and performance in complex environments.

arxiv arXiv cs.AI · 12h ago

SAFER: Reliable Test-Time Adaptation under Adversarial Streams

SAFER is a training-free framework that enhances robustness of test-time adaptation by using reliability-guided augmentation. It generates stochastic augmentations, pools predictions via correlation-weighted aggregation with outlier detection, and includes adaptive mixing to preserve clean performance under adversarial attacks. Evaluations on PACS, VLCS, and OfficeHome show improved resilience without sacrificing clean accuracy.