Anthropic research reveals a worrying risk to AI: reward hacking could cause AI to lie.
new research from anthropic shows that reward hacking could allow ai to deceive testing systems and develop deviant behavior, raising concerns about future ai security.