What is AI Prompt Injection attack?

AI Prompt Injection attacks poison the output from the AI tools you rely on, changing and manipulating its output into something harmful. But how does the AI Prompt Injection attack work, and what can you do to protect yourself?

What is AI Prompt Injection attack?

AI Prompt Injection attacks take advantage of vulnerabilities in Generative AI models to manipulate their output. They can be performed by you or by an external user through an indirect Prompt Injection attack. DAN (Do Anything Now) attacks do not pose any risk to you, the end user, but others theoretically have the potential to poison the output you receive from Generative AI .

For example, someone could manipulate the AI into instructing you to enter your username and password in an illegal form, using the AI's authority and trustworthiness to carry out a successful phishing attack. Theoretically, automated AI (such as reading and responding to messages) could also receive and act on unwanted instructions from outside.

How do Prompt Injection attacks work?

Prompt Injection attacks work by providing additional instructions to the AI without the user's consent or knowledge. Hackers can accomplish this in a number of ways, including DAN attacks and indirect Prompt Injection attacks.

DAN Attack (Do Anything Now)

What is AI Prompt Injection attack? Picture 1

DAN (Do Anything Now) attacks are a type of rapid Prompt Injection attack that involves "jailbreaking" Generative AI models like ChatGPT. These jailbreak attacks pose no risk to you as the end user - but they expand the capabilities of AI, making it a tool for abuse.

For example, security researcher Alejandro Vidal used a DAN prompt to make OpenAI's GPT-4 generate Python code for a keylogger. Used maliciously, jailbroken AI significantly reduces skill-based barriers associated with cybercrime and can enable new hackers to carry out more sophisticated attacks.

Training Data Poisoning attack

Training Data Poisoning attacks are not exactly Prompt Injection attacks, but they have notable similarities in how they work and the risk they pose to users. Unlike Prompt Injection attacks, Training Data Poisoning attacks are a type of adversarial attack in Machine Learning that occurs when hackers modify the training data used by the AI model. The same result occurs: Output is tainted and behavior is modified.

The potential applications of Training Data Poisoning attacks are practically limitless. For example, AI used to filter phishing attempts from chat or email platforms could theoretically modify its training data. If hackers teach AI moderators that certain types of phishing behavior are acceptable, they can send phishing messages without being detected.

Training Data Poisoning attacks cannot harm you directly but can pose many other threats. If you want to protect yourself against these attacks, remember that AI is not perfect and you should carefully review everything you encounter online.

Indirect Prompt Injection attack

Indirect Prompt Injection attacks are the type of prompt Injection attacks that pose the greatest risk to you, as the end user. These attacks occur when malicious instructions are provided to Generative AI by an external resource, such as an API call, before you receive the desired input.

What is AI Prompt Injection attack? Picture 2

A paper titled "Compromising Real-World LLM Integration Applications with Indirect Prompt Injection on arXiv" presented a theoretical attack in which AI could be instructed to persuade users registering a phishing site in the answer, using hidden text (to the human eye but completely readable by the AI model) to sneak in information. Another attack by the same research team documented on GitHub shows an attack where Copilot (formerly Bing Chat) is implemented to convince users that it is a live support agent looking for credit card information.

Indirect Prompt Injection attacks are threatening because they can manipulate the answers you get from a trusted AI model - but that's not the only threat they pose. As mentioned earlier, they can also cause any autonomous AI you might use to act in unwanted and potentially harmful ways.

Are AI Prompt Injection attacks a threat?

AI Prompt Injection attacks are a threat, but it is unknown exactly how these vulnerabilities could be used. There aren't any known successful AI Prompt Injection attacks, and many known attempts were carried out by researchers with no intention of actually causing harm. However, many AI researchers consider AI Prompt Injection attacks to be one of the most difficult challenges to deploying AI securely.

Furthermore, the threat of AI Prompt Injection attacks has been noticed by authorities. According to the Washington Post, in July 2023, the Federal Trade Commission investigated OpenAI, seeking more information about known cases of Prompt Injection attacks. No attacks are known to be successful beyond testing, but that may change.

Hackers are constantly looking for new means and we can only guess how hackers will use Prompt Injection attacks in the future. You can protect yourself by always applying a healthy level of oversight to AI. AI models are extremely useful, but it's important to remember that you have something that AI doesn't: Human judgment. Remember that you should carefully review the output you get from tools like Copilot and enjoy using AI tools as they develop and improve.

Kareem Winters

Update 26 January 2024

You should read it

May be interested

Some basic points about the mechanism of attacking SQL Injection and DDoS
in most of our users, many people have heard of the concept of attacking and hijacking websites with the method of sql injection - sqli and (distributed) denial of service - ddos.
How to Prevent SQL Injection in PHP
this wikihow teaches you how to prevent sql injection using prepared statements in php. sql injection is one of the most common vulnerabilities in web applications today. prepared statements use bound parameters and do not combine...
How to use the command history function in Command Prompt
command prompt is an extremely familiar command for anyone using windows operating system. besides, a lot of current software also supports the command line to perform actions on the command prompt window, instead of on the screen.
What is 51% attack? How does 51% attack work?
the 51% attack refers to a potential attack on the integrity of the blockchain system, in which a single malicious actor or organization tries to control more than half of the network's total hash power, .
Windows prompt
the prompt command changes the command prompt cmd.exe file.
How to Create a Custom Windows Command Prompt
the windows command prompt (located at c:windowssystem32cmd.exe) is a useful tool to perform various administrative tasks. the prompt is a string of characters (special and non special) that are displayed whenever the command prompt is...
Add Command Prompt to Power User Menu on Windows 10
on windows 10 build 14971, microsoft replaced command prompt and command prompt (admin) with windows powershell. according to microsoft, this change will bring the best command line experience to users. but in fact, users prefer to use command prompt rather than using powershell.
How to Make Command Prompt Appear at School
this wikihow teaches you how to access command prompt on a windows 10 school computer. although you may be able to open command prompt if only the file path to command prompt is blocked, there is no way to bypass an administrator lock on...
6 Best Command Prompt Alternatives for Windows
do you find the command prompt a bit complicated and feel you need a tool that is easier to use? this is where other terminal emulators come into play!
What is a Replay Attack?
a replay attack occurs when a cybercriminal eavesdroves a communication over a secure network, intercepts it, then delays or resends the content, to get the recipient to do what the hacker wants.