Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

The Yuvion LLM is a new large language model designed to address safety failures by treating adversarial robustness and agentic capability as primary objectives. It utilizes a pipeline combining adversarially aware data construction, knowledge-enhanced continued pretraining, and policy-grounded multi-task safety post-training.

The model employs risk-aware supervised fine-tuning and reinforcement learning-based policy optimization for tool use and multi-step reasoning.
Yuvion LLM RiskEval (YLRE) introduces 93 benchmarks across four categories to evaluate safety, adversarial robustness, and real-world capabilities.
The Yuvion-8B variant outperforms state-of-the-art baselines, including larger models like GPT-5.4 and Qwen3-MAX, on several safety tasks.

This approach aims to provide more realistic safety performance by focusing on strategic attempts to evade model policies rather than just natural inputs.