Image generation — korshunov.ai

Image generation Page 1 / 3

Hybrid ANN-SNN Pipeline with Local Plasticity

A hybrid ANN-SNN pipeline uses pretrained EfficientNet encoders and converts their activations to spike trains via rate-coding. The system trains a CoLaNET spiking classifier with local plasticity rules, achieving 99.09% accuracy on ImageNet's 64-class benchmark, matching conventional deep networks.

arxiv arXiv cs.LG · 6d ago

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

PU-UNet introduces stable product-unit residual blocks into U-Net for medical image segmentation, enabling explicit multiplicative feature interactions without numerical instability. It achieves high Dice scores on ISIC 2018, Kvasir-SEG, and BUSI, outperforms a Residual U-Net baseline in Dice and IoU, and eliminates false-positive rates on normal BUSI cases.

arxiv arXiv cs.LG · 6d ago

MakeupMirror Improves Facial Attribute Preservation in Diffusion Models

MakeupMirror, a diffusion-based makeup transfer model, achieves +60% improvement in facial recognition similarity and -50% reduction in skin tone difference compared to Stable-Makeup. It preserves facial features and skin tone with expert acceptance of 94% across identity criteria, operating at 0.7s latency through a Levenberg-Marquardt Langevin sampler.

arxiv arXiv cs.LG · 6d ago

EFIQA: Label-Free Fundus Image Quality Assessment with Explainability

EFIQA proposes a label-free framework for fundus image quality assessment that uses anatomical priors to generate spatial quality maps. It first trains an unsupervised anomaly detector via masked anatomical inpainting to identify missing vasculature, then distills this knowledge into a shallow adapter for quality mapping. Evaluation on external datasets shows EFIQA outperforms supervised methods in both performance and explainability across diverse quality criteria.

arxiv arXiv cs.CL · 6d ago

Black-Box Probe Detects Identity Memorization in Text-to-Image Models

A new black-box probe distinguishes whether text-to-image models memorize identities or fabricate them, without needing reference photos or training data. The NAMESAKES dataset includes over one thousand public figures' names and faces, along with less famous perturbed names, to benchmark this capability across state-of-the-art models.

media r/LocalLLaMA · 7d ago

Local LLM Agent Now Generates Images and Video Offline

A user shared that their local LLM agent was equipped with MCP tools to generate images and videos directly. The system operates fully offline and is free to use, with details and source code available in the comments.

arxiv arXiv cs.CL · 7d ago

DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning

DreamReasoner-8B is an open-source block diffusion model that demonstrates strong long-chain-of-thought reasoning. A systematic study shows that small training block sizes preserve reasoning effectiveness, while large sizes degrade performance. Block-size curriculum learning gradually transitions training from fine to coarse blocks, enabling robust and generalizable reasoning across inference settings, with results competitive to Qwen3-8B on mathematical and code benchmarks.

arxiv arXiv cs.LG · 7d ago

Flow-Matching Test-Time Adaptation for OCT Image Denoising

A flow-matching-based method aligns test-time OCT images to synthetic reference trajectories, matching histogram distributions to reduce noise-induced pixel mismatches. By removing time conditioning, the model adapts to real-world noise variations, achieving state-of-the-art biomarker segmentation in Age-related Macular Degeneration stages.

arxiv arXiv cs.LG · 7d ago

Quantum GAN Augmentation Shows No Benefit in Brain MRI

A controlled benchmark finds no significant performance gain from quantum generative models in brain MRI augmentation. Synthetic samples produced by quantum and classical GANs are statistically indistinguishable, with both showing mode collapse and off-distribution samples, especially at low data fractions. The study concludes that quantum augmentation does not outperform classical methods and acts more as regularization than data expansion.

arxiv arXiv cs.LG · 7d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

Sumi is a 7B-parameter uniform diffusion language model pretrained from scratch on 1.5T tokens. It competes with autoregressive models on knowledge, reasoning, and coding tasks but underperforms on commonsense benchmarks, likely due to its education-heavy data mixture. The model weights, checkpoints, and full training recipe are publicly released.

arxiv arXiv cs.CL · 7d ago

Sumi: Open Uniform Diffusion Language Model from Scratch

arxiv arXiv cs.AI · 7d ago

ProductConsistency: Enhancing Product Identity in Image Editing

The ProductConsistency dataset introduces 87k SFT samples and 869 RL samples to improve product identity preservation in image editing. It includes a benchmark for standardized evaluation and uses a cyclic consistency reward to enforce semantic product identity through caption similarity. Fine-tuning Qwen-Image-Edit-2511 and Flux.1-Kontext-dev shows a 5x reduction in character error rate and improved text rendering and visual quality.

media r/LocalLLaMA · 7d ago

TRELLIS.2 now runs natively on MLX

TRELLIS.2 has been ported to run natively on MLX for Apple Silicon. The model supports 512x512 and 1024x1024 image inputs, with generation times of approximately 70 seconds for 512x517 and 300 to 700 seconds for 1024x1024 on an M4 Max with 128GB unified memory.

arxiv arXiv cs.LG · 8d ago

Recursive Masked Diffusion Models Introduce New Scaling Axis

Recursive Masked Diffusion Models (R-MDMs) introduce recursive depth as a third scaling axis by reapplying a denoising transformer within each diffusion step. This recursion enables iterative output refinement without increasing parameter count, achieving performance comparable to non-recursive models with up to L times more parameters, where L is the number of iterations. R-MDMs also reduce inference compute by partially replacing denoising steps with recursive refinement.

arxiv arXiv cs.LG · 8d ago

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt introduces NTRK, a reward-guided diffusion sampler that injects reward gradients via the noise term without altering the reverse kernel. By using a whitening operator, NTRK safely biases noise toward high reward, preserving sample quality while maintaining strong guidance. On aesthetic generation, NTRK achieves superior reward performance with 25 NFEs, reducing compute by 20× compared to state-of-the-art baselines.

arxiv arXiv cs.LG · 8d ago

Volterra Generative Models Introduce Fractional Noise for Score-Based Generation

Volterra generative models propose a continuous-time score-based framework using fractional kernels to inject path-dependent noise, avoiding memoryless noising in traditional diffusion models. The approach introduces finite-dimensional Markovian lifts and proves squared error bounds, demonstrating improved generation on MNIST and potential for natural images, with a bridge sampler enhancing stability for larger models.

arxiv arXiv cs.LG · 8d ago

Kolmogorov Regression for Robust Diffusion Policies

A backward Kolmogorov equation lifts diffusion policies to a Cameron-Martin space, replacing stochastic score matching with a deterministic PDE. This approach achieves convergence bounds tied to kernel effective rank, improves trajectory regularity, and enables a deterministic failure detector without rewards. Validation shows 17% higher reward on PushT and 28.4% lower RMSE on a manufacturing line, with 96% reduction in deadlock events via Hamilton-Jacobi certification.

arxiv arXiv cs.LG · 8d ago

AdaVoMP: Adaptive Volumetric Mechanical Property Fields

AdaVoMP predicts accurate spatially-varying Young's modulus, Poisson's ratio, and density for 3D objects across resolutions. It uses a sparse, adaptive voxel structure and a sparse transformer encoder-decoder to achieve 16^3 times higher resolution than prior methods, with improved accuracy and lower test-time compute.

arxiv arXiv cs.LG · 8d ago

AoiZora: Topology-Aware Auto-Parallel Optimization for Video Diffusion Inference

AoiZora is a compiler-mediated topology planner that improves low-latency video diffusion inference on TPU sub-slices. By aligning logical sharding with physical placement through the compilation flow, it reduces one-step denoising latency by up to 1.42x on TPU v5e sub-slices compared to existing methods.

arxiv arXiv cs.LG · 8d ago

SelFix: Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

SelFix improves fixed-point inversion by selecting solutions that produce straighter inverse trajectories, enhancing real-image reconstruction and source-preserving editing. Experiments on FLUX.1-dev and PIE-Bench show it outperforms prior baselines in both reconstruction quality and editing fidelity.