Prompt injection defense is the set of techniques and architectural layers designed to protect AI systems, particularly those using large language models (LLMs), from adversarial inputs that attempt to overwrite, ignore, or subvert their core system instructions and safety guidelines. This form of attack, known as a prompt injection, exploits the model's instruction-following nature by embedding malicious commands within seemingly benign user queries, posing a significant threat to agentic systems that perform autonomous actions. The defense is a cornerstone of agentic threat modeling and operational security for production AI.
