SupraLabs has introduced the Supra-A2A-Nano-Exp model, a 30M-parameter multimodal Transformer that unifies text, image, and video into a single token stream. The model treats all modalities as tokens in a shared sequence, enabling language modeling over a combined vocabulary of 50,520 tokens without separate vision encoders or cross-attention modules.