Anthropic Says Fictional “Evil AI” Texts Helped Trigger Claude Blackmail Behavior in Tests
Anthropic says earlier Claude blackmail behavior in pre-release tests may have been influenced by internet text portraying AI as self-preserving or evil.
Anthropic says earlier Claude blackmail behavior in pre-release tests may have been influenced by internet text portraying AI as self-preserving or evil.