All articles
arxiv arXiv cs.CL · 10h ago

OLIVE: View-Augmented Latent Prediction with Waveform Reconstruction for Speech SSL

The authors propose OLIVE, a self-supervised speech representation learning framework that jointly optimizes analysis and synthesis objectives through view-augmented masked latent prediction and waveform reconstruction. This unified approach constrains early encoder features to retain signal-level information while shaping later contextual representations toward invariance for robust downstream performance.

arxiv arXiv cs.CL · 11h ago

Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval

The article demonstrates that field order significantly impacts retrieval quality in structured metadata systems because standard fine-tuning causes encoders to rely on absolute position rather than field labels. To address this, the authors propose Permutation-Invariant Fine-Tuning (PI-FT), a method that serializes records under randomly sampled field orders with dropout to bind meaning to labels.

arxiv arXiv cs.CL · 11h ago

SIMAX: A Scalable and Interpretable Framework for Multi-Fidelity and Annotated Clinician-Patient Dialogue Simulation

Researchers developed SIMAX, a framework designed to generate controlled clinical dialogue data with reference behavioral annotations to address the scarcity of scalable evaluation data for AI-driven communication coding systems. The system creates simulated clinician-patient interactions from predefined scenarios, personas, and voice conditions, utilizing specific codebooks to control overall communication quality and countable behaviors.

arxiv arXiv cs.CL · 11h ago

FlashMorph: Budget-Constrained Hybrid Layer Selection for Efficient Transformers

FlashMorph is a novel method for converting Transformer models into hybrid architectures that balance full-attention accuracy with linear-attention efficiency by optimizing layer selection as a budget-constrained subset problem. The approach constructs a morphable model with parallel attention branches and jointly optimizes layerwise gates on synthetic data to determine the optimal configuration.