Distill-on-idle pipeline for on-device memory assistant using 4B models
The article details an engineering approach to building a local AI assistant that converts raw screen captures and meeting transcripts into queryable data using only models that run efficiently on laptops. The system leverages Apple's Vision framework for OCR, idle-time distillation of a 4B Gemma model, and hybrid retrieval to avoid performance bottlenecks.