Reasoning models
arxiv arXiv cs.AI · 6d ago

FlowEdit: Lifelong Pronunciation Adaptation in Flow-Matching TTS

FlowEdit enables frozen flow-matching TTS models to adapt pronunciation corrections over time using latent edits in text embeddings. It stores corrections in a Modern Hopfield Network and retrieves them via soft attention with similarity gating, reducing phoneme error rates by 92.7% on 312 multilingual proper nouns while preserving general-speech quality. Corrections take about 15 seconds to complete on a single GPU.

arxiv arXiv cs.AI · 6d ago

How Transparent is DiffusionGemma?

DiffusionGemma has poor variable transparency due to high opaque serial depth, but this can be mitigated by an interpretable token bottleneck, reducing serial depth to 1.1X that of Gemma 4. Algorithmic transparency is more challenging in diffusion models due to dynamic token predictions, with early evidence of non-chronological reasoning, token smearing, and intermediate-context reasoning. DiffusionGemma is found to be similarly monitorable to Gemma 4.

arxiv arXiv cs.LG · 6d ago

FedMGS: Federated Modality-aware Graph Synthesis for Imbalanced MultiModal Learning

FedMGS addresses client- and node-level modality imbalance in federated graph learning by synthesizing latent semantic representations. It integrates an availability-aware graph encoder, prototype-guided semantic synthesizer, and reliability-calibrated fusion mechanism to recover missing modalities while preserving semantic alignment. Experiments show FedMGS achieves up to 17.41% performance gains over baselines across four tasks.