This article introduces a synthetic multimodal framework designed to replicate First Notice of Loss (FNOL) conditions for insurance fraud detection, addressing the limitations of existing text-only approaches. The system generates agent-customer dialogue transcripts and two-speaker audios to integrate linguistic, behavioral, and speaker-based indicators.

  • Generates synthetic agent-customer dialogue transcripts and two-speaker audios to replicate FNOL scenarios.
  • Performs Automatic Speech Recognition (ASR) and diarisation on the generated audio data.
  • Combines NER, regex-based feature extraction, LLM-RAG retrieval, and speaker embeddings in a rule-based risk score.
  • Flags narrative reuse, structural inconsistencies, and cross-case voice repetition while balancing sensitivity and false positives.

The framework offers a reproducible baseline for fraud detection that extends beyond text-only methods, with dataset validation demonstrating stability and transfer potential.