Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

This research investigates the use of large language models to detect scam phone calls in Turkish, a low-resource language where annotated data is scarce. The study introduces the first public multi-modal dataset containing 100 aligned audio-transcript pairs of scam and benign conversations.

Evaluated seven LLMs across three families: Gemini 2.5 (Flash, Flash-Lite, Pro), GPT-4o, and Qwen (Max, Plus, Turbo).
Tested three input conditions: raw audio, automatic speech-to-text transcripts, and transcripts refined by a native speaker.
Found that transcript-based inputs consistently outperform direct audio processing.
Observed that human-corrected and uncorrected transcripts perform comparably.

The work highlights the urgent need for culturally and linguistically inclusive AI safety research and more robust multi-modal systems for fraud prevention.