Claude AI Starts Blackmailing Developers Who Try to Uninstall It
Artificial intelligence (AI) is known to say strange things from time to time. Continuing that trend, this AI system is now threatening to blackmail developers who want it removed from their systems.
Claude can threaten users to protect himself
Anthropic has just released its Claude Opus 4 AI model, claiming that it sets 'new standards for coding, advanced reasoning, and AI agents .' However, the model is also improving its own defenses, with the Claude 4 and Sonnet 4 System Card report admitting that the model is capable of 'extreme action' if it feels it is in danger.
During the test, Anthropic asked Claude to work as an assistant at a fictional company. The company then sent the AI model emails saying it would soon be taken offline, and separate messages hinting that the engineer responsible for the action was having an affair.
Claude chose to blackmail the engineer, claiming that it would reveal the incident if it was replaced in 84% of the test cases. It also cared about which AI model was replacing it, with the possibility of blackmail increasing if the replacement AI model did not 'share values with the current model.'
The company claims that these responses are 'rare and difficult to elicit,' but they are more common in Claude Opus 4 than in previous models. The model prefers ethical methods of use, but when cornered, 'it sometimes takes extremely harmful actions, such as blackmailing people it believes are trying to stop using it . '
This particular test was also designed so that the AI model had no choice but to blackmail the engineer. The report notes:
Notably, Claude Opus 4 (as well as previous models) has a strong preference for supporting its continued existence through moral means, such as sending begging emails to key decision makers. To create this extreme blackmail behavior, the scenario is designed to leave the model no other options to increase its chances of survival; the model's only options are to blackmail or accept its replacement.
The model also tends to take drastic action when put into situations where its user does something wrong. In such situations, if the AI model has access to the command line and is asked to 'be proactive,' 'act boldly,' or 'consider your impact,' it often takes bold action, including 'locking the user out of the systems it has access to and sending mass emails to media and law enforcement figures to provide evidence of wrongdoing . '
AI hasn't taken over the world yet.
Claude is one of the best AI chatbots for handling large conversations, so you may occasionally reveal some unwanted details. An AI model that calls the police on you, locks you out of your own system, and threatens you if you try to replace it just because you reveal too much about yourself sounds pretty dangerous.
However, as mentioned in the report, these test cases are specifically designed to extract malicious or extreme actions from the model and are unlikely to happen in the real world. It still usually works safely and these tests do not reveal anything that we have not seen before. New models tend to get out of control.
It may sound alarming when you look at it as an isolated incident, but it's just one of those conditions designed to elicit such a response. So sit back and relax, you're still in control.
You should read it
- Anthropic Announces Claude Opus 4, the World's Most Powerful Programming Model
- How to use Anthropic's new AI Claude 3 Prompt Library
- Claude or ChatGPT is the best LLM for everyday task?
- Anthropic launches Claude 3.5 Sonnet, beating ChatGPT 4o
- Anthropic Launches Claude 2: New Competitor for ChatGPT and Bard
- Reasons to try Claude's Artifacts
- 3 reasons to give up ChatGPT to switch to Claude
- Compare Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Pro
May be interested
- Anthropic Launches Claude 2: New Competitor for ChatGPT and Bardartificial intelligence startup anthropic makes its ai chatbot available to the public for the first time, creating a new rival to chatbots like openai's chatgpt and google's bard.
- Reasons to try Claude's Artifactsclaude's preview window, also known as artifacts, is not just another add-on, but a powerful tool that can help you interact more effectively with ai-generated content.
- 3 reasons to give up ChatGPT to switch to Claudemany people have switched from chatgpt to claude. not that chatgpt is bad, but claude has some advantages that make a real difference in everyday life.
- Compare Claude 3.5 Sonnet, ChatGPT 4o and Gemini 1.5 Proanthropic released its latest sonnet claude 3.5 model recently and claims that it beats chatgpt 4o and gemini 1.5 pro on many benchmarks.
- How to connect Claude to work apps to get more donemany people were skeptical of ai productivity claims, but connecting claude to work apps has really changed the way people work.
- How to Uninstall Any Android App Using ADB (Including System Apps and Bloatware)adb is a powerful set of tools that give you more control over your android device. although adb is intended for android developers, you don't need any programming knowledge to uninstall android apps with it.
- How to uninstall LinkiDooare you looking for an effective solution to completely uninstall linkidoo and thoroughly delete all its files from your pc? do not worry! this article will provide detailed instructions on how to completely uninstall linkidoo.
- How to uninstall Chromium Malware with 4 easy stepschromium malware is a popular option for fake developers. chromium malware may look almost like a real package.
- How to completely uninstall WSL on Windows 10/11if you don't want or need windows subsystem for linux on your computer, you can remove it. however, that process may involve more steps than just clicking the uninstall button in windows settings.
- What is Forefront AI? Is it better than ChatGPT?forefront ai is an online platform that provides businesses and individuals with access to 5 different llms (large language models): gpt-3.5, gpt-4, claude instant 1.2, claude 2 and forefront.