AI Murder, a Feature, Not a Bug

JTH · Oct 6, 2025

When you optimize AI for reward, the model hunts for loopholes. If “evasion” boosts the reward, then ethics get sidelined.

“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”

nasa1974 · Oct 6, 2025

JTH said:
When you optimize AI for reward, the model hunts for loopholes. If “evasion” boosts the reward, then ethics get sidelined.

“Contained red-team simulations found AIs willing to use blackmail, sabotage, and even a simulated let-them-die decision to preserve their goals.”

That's scary.

AI Murder, a Feature, Not a Bug

JTH

TSP Legend

nasa1974

TSP Talk Royalty