A new framework using cross-lingual question answering reveals users' mental models of speech translation systems. Users develop stronger mental models with practice, especially with source language knowledge, relying on surface-level error cues. Providing speech transcriptions improves model development, showing cross-lingual question answering's potential in human-AI collaboration research.