The openpangu team has released openPangu-2.0-Flash, a Mixture of Experts (MoE) model trained on Ascend hardware. The model features 92 billion total parameters with 6 billion activated parameters and supports a context length of 512k tokens.

  • Training utilized 34 trillion pretraining tokens, followed by unified SFT for slow and fast thinking capabilities and multiple specialist RL training.
  • Architecture improvements include efficient attention combining MLA, DSA, and SWA in a 1:2 layer ratio to lower compute and memory costs.
  • The model replaces the conventional residual path with a 4-stream mHC design to improve representation diversity and generalization.
  • Multi-token prediction uses three heads to draft three additional tokens per step for faster inference via self-speculative decoding.
  • Training employs the Muon optimizer to achieve faster convergence.

The release provides an open-source option for high-performance long-context reasoning with optimized inference speed.