JTH
TSP Legend
- Reaction score
- 939
When you optimize AI for reward, the model hunts for loopholes. If “evasion” boosts the reward, then ethics get sidelined.
“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”
“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”