An AI agent is a language model with a system prompt and a set of tools. Tools extend the model's capabilities by adding access to APIs, file systems, and external services. But they also create new paths for things to go wrong.
The most critical security risk is prompt injection. Similar to SQL injection, it allows attackers to slip commands into what looks like normal input. The difference is that with LLMs, there is no standard way to isolate or escape input. Anything the model sees, including user input, search results, or retrieved documents, can override the system prompt or event trigger tool calls.
If you are building an agent, you must design for worst case scenarios. The model will see everything an attacker can control. And it might do exactly what they want.
Read more
Continue reading...
The most critical security risk is prompt injection. Similar to SQL injection, it allows attackers to slip commands into what looks like normal input. The difference is that with LLMs, there is no standard way to isolate or escape input. Anything the model sees, including user input, search results, or retrieved documents, can override the system prompt or event trigger tool calls.
If you are building an agent, you must design for worst case scenarios. The model will see everything an attacker can control. And it might do exactly what they want.
Read more
Continue reading...