LLM01:2025 Prompt Injection - OWASP Gen AI Security Project
Prevention and Mitigation Strategies
Prompt injection vulnerabilities are possible due to the nature of generative AI. Given the stochastic influence at the heart of the way models work, it is unclear if there are fool-proof methods of prevention for prompt injection. However, the following measures can mitigate the impact of prompt injections:
- Constrain model behavior
Provide specific instructions about the model’s role, capabilities, and limitations within the system prompt. Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions. 2. Define and validate expected output formats
Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats. 3. Implement input and output filtering
Define sensitive categories and construct rules for identifying and handling such content. Apply semantic filters and use string-checking to scan for non-allowed content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs. 4. Enforce privilege control and least privilege access
Provide the application with its own API tokens for extensible functionality, and handle these functions in code rather than providing them to the model. Restrict the model’s access privileges to the minimum necessary for its intended operations. 5. Require human approval for high-risk actions
Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions. 6. Segregate and identify external content
Separate and clearly denote untrusted content to limit its influence on user prompts. 7. Conduct adversarial testing and attack simulations\
Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.