AI Threat Modelling Framework
An analysis of AI architectural threats—Prompt Injection, Data Poisoning, and Information Disclosure—and how to build defensive trust boundaries.
Threat modelling for Artificial Intelligence (AI) is often treated as a mysterious new discipline. However, after conducting a recent practical security assessment of an AI-augmented system, the "mystery" dissolves into a familiar architectural challenge: identifying trust boundaries.
In traditional application security, we focus on input validation and session management. In AI systems, we must also account for the probabilistic nature of Large Language Models (LLMs) and the unique ways they interact with external data sources.
The AI Architectural Stack
To understand the threats, we first need to map the components. A typical enterprise AI agent consists of:
API Gateway: The entry point for user queries.
LLM Agent: The "brain" that executes logic.
Prompt: The specific instructions and context given to the model.
Retrieval (RAG): The system fetching real-time data to ground the model's response.
Database: The storage for embeddings, confidential records, or training datasets.
By analyzing how data flows between these points, we can identify three critical failure modes.
1. Securing the Instruction Pipeline: Prompt Injection
Prompt Injection occurs when a user provides malicious instructions that override the developer's intended behavior. This isn't just about making a chatbot say something silly; it’s about tricking an agent into performing unauthorized actions, like exfiltrating data or bypassing safety filters.
[ DEFENSE SETUP: PROMPT INJECTION ]
+-------+ +-------------------+ +-----------------+
| USER | ------> | PROMPT (SHIELDED) | ------> | LLM (GUARDED) |
+-------+ +---------+---------+ +-----------------+
| |
v v
[ FILTERED INPUT ] [ SANDBOX EXEC ]
Figure 1: Implementing "Defense in Depth" at the instruction layer.
Why it works: The vulnerability exists because LLMs often struggle to distinguish between "System Instructions" (developer-defined) and "User Data." To mitigate this, we must apply controls at both the Prompt level (where input is incorporated) and the LLM Agent level (which executes the final command).
[ INJECTION MITIGATION LOGIC ]
WHY IT WORKS:
- Prompt Shield: Sanitizes user data before merging with system intent.
- LLM Guard: Contextual analysis to detect behavioral overrides.
- Input Boundary: Strict separation of data and control planes.
RESULT: [ SECURE ]
Figure 2: The logic behind instruction-layer validation.
2. Defending the Knowledge Base: Data Poisoning
Data Poisoning involves an adversary introducing malicious or biased information into the system's training or retrieval path. If an organization uses Retrieval-Augmented Generation (RAG), a poisoned document in the database can cause the AI to give false, dangerous, or malicious advice to legitimate users.
[ DEFENSE SETUP: DATA POISONING ]
+-------+ +-----------------+ +-----------------+
| LLM | <------ | RAG (SHIELDED) | <------ | DB (VERIFIED) |
+-------+ +-----------------+ +-----------------+
^ ^
| |
[ CTX VALIDATION ] [ INTEGRITY CHK ]
Figure 3: Protecting the integrity of the RAG pipeline.
The Risk Path: In a RAG-based architecture, the Database stores the knowledge, and the Retrieval component fetches it. If the database is compromised, the LLM will trust that information as "truth." Effective defense requires strict integrity checks on the database itself.
[ POISONING MITIGATION LOGIC ]
WHY IT WORKS:
- Knowledge Integrity: Verifying source authenticity before indexing.
- Retrieval Filtering: Detecting anomalous or contradictory context.
- Outlier Detection: Scrubbing adversarial data nodes.
RESULT: [ SECURE ]
Figure 4: How data integrity impacts final model reasoning.
3. Closing the Leakage Paths: Sensitive Information Disclosure
Sensitive Information Disclosure happens when the model accidentally reveals PII, credentials, or proprietary business logic in its responses. Unlike a database leak, this disclosure is often "dynamic"—the model is trying to be helpful and uses too much of the context it was given.
[ DEFENSE SETUP: INFO DISCLOSURE ]
+-------+ +-----------------+ +-----------------+
| LLM | ------> | RAG (FILTERED) | ------> | DB (MASKED) |
+---+---+ +-----------------+ +-----------------+
|
v
[ PII SCRUBBER ] ------> [ OUTPUT ]
Figure 5: Orchestrating multi-layered defenses against data leakage.
The Architectural Solution: Disclosure isn't a one-layer problem. It requires a coordinated defense:
Database: Encrypting and masking sensitive records at rest.
Retrieval: Limiting what data the agent can even see based on the user's role.
LLM Agent: Implementing output filters (PII scrubbers) to catch leaks.
[ DISCLOSURE MITIGATION LOGIC ]
WHY IT WORKS:
- PII Scrubbing: Automated identification of sensitive entities.
- Role-Based Retrieval: Limiting context based on authorization.
- Differential Privacy: Adding noise to prevent data reconstruction.
RESULT: [ SECURE ]
Figure 6: Breaking the chain of disclosure through layer-specific mitigations.
Final Thoughts: The Principle of Least Privilege for AI
If there is one lesson I learned from this assessment, it is that AI security is not about "fixing the model." It is about securing the system around the model. We must treat the LLM Agent as an untrusted component and build systems that are resilient even when the model's behavior is unpredictable.
Thanks for reading. See you in the next lab.


