Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference

This paper introduces Concordia, a runtime designed to provide fault tolerance for long-running LLM agents by maintaining valuable state on GPUs without restarting the serving stack. The system utilizes a device-resident persistent kernel that interposes on GPU module loading to support PTX- and SASS-level instrumentation.

Concordia JIT-compiles specialized delta-checkpoint handlers, such as KV-block scanners and recovery appliers, which are hot-swapped into the persistent kernel's operator table. The runtime consumes a lock-free ring buffer of compute and checkpoint tasks, triggering dirty-page detection and staging deltas automatically. Committed records are appended to a CPU-visible log in CXL memory or host DRAM, allowing recovery without putting the host CPU on the critical path.

This approach enables fault-tolerant LLM inference by observing binary kernels at device synchronization points and recovering state efficiently without requiring application-specific checkpoint logic in every component.