Image generation — korshunov.ai

Image generation Page 3 / 3

STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training

STAR introduces a spatio-temporal reward allocation method for text-to-image generation, using attention maps to dynamically assign advantages across denoising steps. It improves semantic alignment, text rendering, and preference optimization in Stable Diffusion 3.5 Medium, achieving 0.9759, 0.9757, and 23.60 on GenEval, OCR, and PickScore respectively.

arxiv arXiv cs.AI · 8d ago

Volterra Generative Models Introduce Fractional Noise for Score-Based Generation

Volterra generative models propose a continuous-time score-based framework using fractional kernels to inject path-dependent noise, avoiding memoryless noising in traditional diffusion models. The approach employs finite-dimensional Markovian lifts and demonstrates improved generation on MNIST and CIFAR-10, with a bridge sampler enhancing stability for larger models.

arxiv arXiv cs.AI · 8d ago

ReAge3D: Realistic 3D Face Re-Aging with View Consistency

ReAge3D introduces a framework for realistic and identity-preserving 3D face re-aging. It uses a 2D diffusion model and center-out editing to ensure multi-view consistency, preserving fine age-related details through masked diffusion and view reconstruction.

media r/LocalLLaMA · 9d ago

Is DiffusionGemma really that good in a PI agent?

A Reddit post asks whether DiffusionGemma performs exceptionally well in a PI agent. The post includes a link to an image and references comments section for further discussion.

media r/LocalLLaMA · 9d ago

Are Quantized Image Generation Models Still WIP?

Users report inconsistent results when using quantized models in image generation, with SD 1.5 working well but SDXL failing. Despite successful conversion and quantization using tools like convert.py and llama-quantize, some users obtain poor outputs while others do not, raising questions about the current state and reliability of quantized image generation technology.

arxiv arXiv cs.CL · 9d ago

LESS Is More: Adaptive Sampling for Diffusion Language Models

LESS introduces a training-free, model-agnostic adaptive sampler that reduces reverse denoising steps by 72.1% compared to fixed-budget decoding. It achieves higher accuracy than existing training-free samplers and lowers inference compute and latency through mutual-stability rules that ensure token commitment only when predictions are confident, consistent, and stable.

arxiv arXiv cs.AI · 9d ago

ActiveSAM: Fast and Accurate Open-Vocabulary Segmentation

ActiveSAM is a training-free, zero-shot framework that enhances SAM 3 for open-vocabulary semantic segmentation by identifying an image-conditioned active class set. It improves speed-accuracy tradeoff, outperforming SegEarth-OV3 by +1.4 mIoU on average and running up to 5.5x faster on large-vocabulary datasets, with strong robustness under image corruption.

arxiv arXiv cs.AI · 9d ago

Phase in Neural Representations: An Internal Oppenheim-Lim Test

Image classifiers like PRISM2D, GFNet, and ViT-B/16 show that phase, not magnitude, drives predictions in hidden layers. ResNet-50 reveals a latent sign code in late blocks, indicating phase/sign identity exists across architectures, though expressed differently due to activation and readout mechanisms.

arxiv arXiv cs.LG · 9d ago

Hybrid Convolutional VAE for Crypto Volatility Surfaces

A convolutional variational autoencoder trained on 6,034 Binance Options surfaces for BTC and ETH achieves 0.94-1.56 vol-point RMSE under 10-50% masking. The hybrid predictor reduces error from 7.00 to 0.83 vol points at 50% masking, outperforming parametric re-fit in structured hole patterns and detecting abnormal market events without supervision.

arxiv arXiv cs.LG · 9d ago

ActiveSAM: Fast and Accurate Open-Vocabulary Segmentation

arxiv arXiv cs.LG · 9d ago