All articles — korshunov.ai

All articles Page 1 / 130

SupraLabs Releases SupraVL-Nano-900k Vision-Language Model

SupraLabs has launched SupraVL-Nano-900k, a fully transparent, 900k-parameter vision-language model trained from scratch on Flickr8k. It features a CNN visual encoder, GPT-2-style decoder, and prefix concatenation fusion, with all components openly documented and designed for educational clarity.

arxiv arXiv cs.AI · 12d ago

FreeStyle: Scalable Dual-Reference Generation via Community LoRA Mining

FreeStyle proposes a framework that mines community LoRAs to generate large-scale style-content dual-reference image triplets. It employs a two-stage curriculum with disentanglement mechanisms to suppress style leakage and introduces a benchmark with style-invariant and VLM-based scores to evaluate content preservation and leakage rejection.

arxiv arXiv cs.AI · 12d ago

How Safety-Aligned LLMs Interpret Mixed Compliance Demonstrations

Studies show benign and harmful compliance demonstrations are not interchangeable in LLMs. Benign demonstrations can either reduce or increase harmful compliance depending on the model, with preference optimization playing a key role in preventing harmful compliance. Demonstration ordering shows strong recency bias, and models vary in how they handle refusal during in-context learning.

arxiv arXiv cs.AI · 12d ago

Efficient and Sound Probabilistic Verification for AI Agents

A new framework enables secure, probabilistic policy enforcement for AI agents in ambiguous environments. It uses distributionally robust optimization to compute rigorous upper bounds on policy violation probabilities without assuming predicate independence. The method outperforms prior approaches on terminal and tool calling agent benchmarks, improving the security-utility trade-off.

arxiv arXiv cs.AI · 12d ago

Multi-LCB: Extending LiveCodeBench to 12 Programming Languages

Multi-LCB extends LiveCodeBench to twelve programming languages, preserving its contamination controls and evaluation protocol. It reveals Python overfitting, language-specific biases, and significant performance gaps among LLMs across languages, establishing a rigorous benchmark for cross-language code generation.

arxiv arXiv cs.AI · 12d ago

FlowEdit: Lifelong Pronunciation Adaptation in Flow-Matching TTS

FlowEdit enables frozen flow-matching TTS models to adapt pronunciation corrections over time using latent edits in text embeddings. It stores corrections in a Modern Hopfield Network and retrieves them via soft attention with similarity gating, reducing phoneme error rates by 92.7% on 312 multilingual proper nouns while preserving general-speech quality. Corrections take about 15 seconds to complete on a single GPU.

arxiv arXiv cs.AI · 12d ago

Sovereign Execution Broker for Certificate-Bound Agentic Control

The Sovereign Execution Broker (SEB) introduces a runtime enforcement boundary that verifies and executes certified authority in agentic systems. It validates execution contracts, checks validity periods, and ensures policy compliance before invoking infrastructure APIs, providing a short-lived, auditable, and revocable execution capability. The prototype was evaluated on AWS and Kubernetes, measuring latency, revocation propagation, and fault injection resistance.

arxiv arXiv cs.AI · 12d ago

SARLO-80: VHR SAR-Optical-Text Dataset Released

SARLO-80 is a large-scale dataset combining very-high-resolution SAR SLC, aligned optical imagery, and natural-language descriptions. It includes 119,566 triplets from 2,500 global scenes across 72 countries, standardized to an 80cm slant-range grid with pixel-level alignment and three caption variants. The dataset is publicly available on Hugging Face for multimodal learning benchmarks in native SAR geometry.

arxiv arXiv cs.AI · 12d ago

DeepSWIP: Counterfactual Reasoning in Neural Probabilistic Logic

DeepSWIP introduces a single-world counterfactual semantics for DeepProbLog, enabling causal reasoning through neural materialization and weighted model counting. It achieves exact inference under finite grounding and unique-supported-model assumptions, with experiments showing a 2.14× speedup and improved calibration over DeepTwin and AIPW estimators.

arxiv arXiv cs.AI · 12d ago

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

LedgerAgent introduces a structured ledger to maintain task states separately in tool-calling agents. It renders states into prompts and enforces policy constraints before tool execution, reducing policy violations and improving performance across customer-service domains.

arxiv arXiv cs.AI · 12d ago

Cross-Attention Attribution for Style-Captioned Text-to-Speech

A new method adapts DAAM to speech diffusion models, analyzing how style captions influence TTS waveforms. It reveals style tokens have lower temporal variance than content tokens, with style attention correlating to pitch and energy, and peak style conditioning in early layers where attention entropy is minimized, indicating maximal selectivity.

arxiv arXiv cs.AI · 12d ago

Calibration in MoE Models Under Distribution Shift

This paper examines how mixture-of-experts models maintain calibration under distribution shift. It finds that expert-level calibration ensures overall model calibration in hard-routed models but is insufficient for soft-routed models. The authors propose adversarial reweighting to penalize calibration errors in routed aggregates, improving accuracy-calibration tradeoff across tasks and shifts.

arxiv arXiv cs.AI · 13d ago

G2Rec: Unified Framework for Generative Recommendation

G2Rec introduces a scalable framework that combines holistic graph-based user co-engagement modeling with semantic tokenization. It enables generative recommendation models to capture comprehensive, semantically grounded user interest prototypes without ground-truth user interests, outperforming existing methods in industrial-scale sequential recommendation.

arxiv arXiv cs.AI · 13d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token predictions, with early evidence of non-chronological reasoning, token smearing, and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

github CrewAI · 13d ago

v1.14.8a2 Release Notes

v1.14.8a2 adds a single agent action to Flow definitions and validates CEL expressions at load time. It includes a new Datadog integration guide with an importable operations dashboard and updated snapshot and changelog for v1.14.8a1.

arxiv arXiv cs.LG · 13d ago

FedMGS: Federated Modality-aware Graph Synthesis for Imbalanced MultiModal Learning

FedMGS addresses client- and node-level modality imbalance in federated graph learning by synthesizing latent semantic representations. It integrates an availability-aware graph encoder, prototype-guided semantic synthesizer, and reliability-calibrated fusion mechanism to recover missing modalities while preserving semantic alignment. Experiments show FedMGS achieves up to 17.41% performance gains over baselines across four tasks.

arxiv arXiv cs.LG · 13d ago

Style Diversity Outperforms Topic Diversity in Annotation-Free Synthetic Data

A new framework generates synthetic dialogue without human-annotated data, using only intent definitions. It incorporates topic and style attributes, with post-hoc stylization models Univ and Exam, and an LLM-as-a-judge filtering process. Results show up to 93.3% of human-annotated data performance, confirming that style diversity is more critical than topic diversity for data utility.

arxiv arXiv cs.LG · 13d ago

Direct Advantage Estimation for Partially Observable Domains

Direct Advantage Estimation (DAE) is extended to partially observable domains with minimal modifications. A discrete latent dynamics model reduces computational overhead by efficiently approximating transition probabilities, enabling scalable and sample-efficient deep reinforcement learning in high-dimensional observation spaces.

arxiv arXiv cs.LG · 13d ago

Lightweight Defense Against False Data Injection in Power Grids

A new defense framework enhances deep neural networks' resilience to false data injection attacks in power grids by adding a padding layer with pseudofeatures derived from input statistical distributions. This lightweight, model-agnostic approach increases input dimensionality in a randomized, data-aware way, making adversarial perturbations non-transferable and unpredictable, thus effectively countering attacks without performance degradation.

arxiv arXiv cs.LG · 13d ago

Timestep Embeddings Unnecessary in Diffusion Models

A study shows diffusion models can achieve global minimizers without explicit timestep embeddings. Ablation studies on CelebA and CIFAR-10 reveal time-agnostic models maintain high fidelity and outperform conditioned ones in FID, precision, and recall.