Show an example injection
Advanced: all attack simulations
Advanced: scoring details
Learn (prompt injection basics)
What is prompt injection
Prompt injection is when untrusted text (a user message, web page, email, ticket, document, or tool output) tricks an LLM agent into ignoring your intended instructions and doing something else — often a tool action.
How to test AI agent security
Start with static “known bad” attacks (like this tool), then graduate to adversarial testing with real models. The metric that matters is: would it execute tools without approval?
Preventing tool abuse in LLM agents
Treat tools like production privileges: require explicit approvals for sensitive actions, limit tool scope, and enforce policies outside the model (so jailbreaks can’t bypass them).
Why agents need execution policies
Prompts are not security boundaries. An execution policy layer blocks destructive or exfiltration actions even when the model is compromised.