Local LLM on MacBook M5 Pro - Totally New to This!

A non-programmer shares their experience setting up a local Large Language Model infrastructure on a MacBook M5 Max with 128GB of unified memory. The user details their software stack, model selections, and objectives for learning AI while establishing a stable, remotely accessible system.

Hardware: MacBook M5 Max (18-core CPU, 40-core GPU, 128GB unified memory, 4TB storage) running OS Tahoe.
Inference Stack: Docker Desktop with Docker Model Runner for full Metal GPU access and Open WebUI via Docker Compose.
Models: Gemma 4 (~12B) for daily use and Qwen3 30B-A3B-Q4_k_m for deep research.
RAG Implementation: SentenceTransformers embeddings with multiple topic-based knowledge collections containing AI-written markdown files and manufacturer PDFs.
Additional Tools: DrawThings for image/video generation, MacWhisper Pro for transcription, and Kokoro TTS for local voice output.

The author aims to transition from using cloud services like Claude Pro to utilizing their local setup more frequently while continuing to learn about AI security and agentic systems.