Ex-Omni enables 3D facial animation generation for omni-modal LLMs
Researchers have released Ex-Omni, a public system that generates omni-modal responses from text or speech input. The model produces response text, speech units or decoded audio, and 52-dimensional facial blendshape coefficients.