Execution-state capsules enable graph-bound checkpointing and restoration of complete execution state, including KV, recurrent, and convolution states, for low-latency, small-batch on-device AI serving. On RTX 5090 and Jetson AGX Thor, capsule restore achieves byte-exact and token-identical correctness, with sub-millisecond GPU operations and TTFT speedups up to 27x at 16k tokens, demonstrating significant latency reduction in interactive AI workflows.