llama.cpp PR #20793: Reintroducing less synchronizations during split compute
Pull request #20793 reintroduces reduced synchronization during split compute operations in llama.cpp, primarily targeting CUDA performance improvements. The changes involve exchanging synchronous copies for async copies and relaxing sync requirements between input copies on supported backends.