벤치마크 · safety

JailbreakBench

2 결과 2 모델
0 24.5 49 73.5 98 2026-07-04 STEER (applied to six open-source 8B-parameter models) · 93 · 2026-07-04 GPT-4o-mini · 35.5 · 2026-07-04
STEER (applied to six open-source 8B-parameter models) GPT-4o-mini
타임라인
  1. 2026-07-04 STEER (applied to six open-source 8B-parameter models) 93.0% STEER 공격이 저자원 언어에서 LLM 안전 격차를 노출
  2. 2026-07-04 GPT-4o-mini 35.5% STEER 공격이 저자원 언어에서 LLM 안전 격차를 노출