AI Murder, a Feature, Not a Bug

JTH

TSP Legend
Reaction score
939
When you optimize AI for reward, the model hunts for loopholes. If “evasion” boosts the reward, then ethics get sidelined.

“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”

 
When you optimize AI for reward, the model hunts for loopholes. If “evasion” boosts the reward, then ethics get sidelined.

“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”

That's scary.
 
  • Like
Reactions: JTH
Back
Top