Metal backend adds f16 and bf16 support for concat operator
The Metal backend in llama.cpp has been extended to support f16 and bf16 tensor types for the concat operator, in addition to existing f32 and i32 support. This update includes specialized kernel templates, updated pipeline getters, and improved type-based kernel dispatch, with assistance from pi:llama.cpp/Qwen3.6-27B.