Concordia: JIT-Compiled Persistent-Kernel Checkpointing for Fault-Tolerant LLM Inference
This paper introduces Concordia, a runtime designed to provide fault tolerance for long-running LLM agents by maintaining valuable state on GPUs without restarting the serving stack. The system utilizes a device-resident persistent kernel that interposes on GPU module loading to support PTX- and SASS-level instrumentation.