LLM Security: Data Leaks, Prompts, and Context Risk

How LLMs leak data through memorization, prompt injection, and context overflow, with practical controls for safer AI apps.
LLM security is often reduced to jailbreak screenshots, but the deeper problem is data control. A model can expose sensitive information from training data, prompt context, retrieved documents, tool outputs, memory, or logs.
This article uses research, industry guidance, OWASP, and NIST as the core references.
Hero image note: the hero image is an original AI-generated illustration created for this article. It does not use copied third-party images, logos, or branded assets.
Why LLMs Leak
The USENIX paper by Carlini and coauthors showed that large language models can reproduce verbatim training examples, including rare strings such as identifiers, code, conversations, and public personal information. The important lesson is not limited to one model. Rare or repeated data can be memorized, and attackers can query a model until that data appears.
That becomes a security issue when the training set or fine-tuning set contains:
credentials, tokens, or private keys
support tickets or internal emails
customer records
proprietary source code
licensed or confidential documents
rare identifiers such as account IDs, phone numbers, UUIDs, or reset links
The ChatGPT extraction research showed why production testing needs to be adversarial. A chat model may look safe during normal use, but unusual prompts, repeated tokens, long conversations, or decoding edge cases can expose behavior that normal QA misses.
Prompt Injection Is a Boundary Failure
Prompt injection happens because LLM apps place trusted instructions and untrusted content into the same model context. A system prompt, user message, retrieved document, email, web page, PDF, and tool result are all text. The model has to infer which text has authority.
Attackers can hide instructions inside content the app later retrieves or summarizes. If the model treats that content as instruction, the attacker can redirect the answer, reveal sensitive context, or influence tool calls.
This is why prompt wording alone is not enough. Strong prompts help, but security decisions should be enforced outside the model with authorization checks, tool permissions, schemas, filters, and monitoring.
Context Window Overflow
AWS describes context window overflow as a risk that appears when system prompts, user input, RAG content, tool output, and model output exceed the available context window. When that happens, important instructions can be truncated or weakened by too much competing context.
This is especially risky for RAG and agents. RAG imports external documents into the model context. Agents add tool results, memory, and task state. If the application does not control token budgets, security instructions may become unreliable.
Treat context as a limited security resource. Preserve trusted instructions, reduce low-trust content first, and fail closed when the prompt cannot be assembled safely.
Main Attack Categories
For practical modelling, group LLM attacks into a few categories:
Prompt injection: malicious text attempts to override intended behavior.
Sensitive information disclosure: the model exposes secrets, personal data, system prompts, or private context.
Model extraction: attackers query a model to imitate or steal its behavior.
Membership inference: attackers test whether specific data was included in training.
Poisoning: attackers manipulate training, fine-tuning, or retrieval data.
Evasion: crafted inputs bypass classifiers, moderation, or guardrails.
These categories often overlap. A poisoned document can contain a prompt injection. A prompt injection can cause sensitive information disclosure. A model extraction campaign can include repeated prompts designed to bypass monitoring.
Practical Controls
Start with data minimization. Do not train or fine-tune on secrets. Scan datasets for credentials, customer identifiers, private keys, internal hostnames, and confidential project names. Remove fields that the model does not need.
Then harden the application layer:
authorize documents before retrieval
keep tenant filtering outside the model
delimit retrieved content clearly
strip hidden text, scripts, metadata, and invisible Unicode from documents
limit retrieved chunks and total context size
use structured tool schemas
scope tool credentials per user or workflow
require confirmation for sensitive actions
log document IDs and tool calls for investigation
Finally, test and monitor for abuse:
prompt injection through direct input and retrieved documents
attempts to reveal system prompts
repeated-token or verbatim-reproduction prompts
long-context sessions that threaten truncation
sensitive data in model output
unauthorized document retrieval
unusual rates of similar prompts
The strongest pattern is simple: the model can suggest, but code enforces.
Security Questions Before Launch
Before shipping an LLM feature, answer:
What data enters the model?
Who is allowed to see that data?
Which instructions are trusted?
Which content is untrusted?
What can the model output?
What tools can the model call?
What happens when context is too large?
How are extraction attempts detected?
What logs are retained?
Who reviews AI-related incidents?
If those answers are unclear, the feature is not ready for sensitive data.
Conclusion
LLM security is not only about blocking jailbreaks. Models can memorize data, prompts can be attacked, RAG can inject private context, and agents can turn text into actions.
The safest design assumes prompts are attack surfaces, context is limited, retrieved content is untrusted until authorized, and model output must be checked before it affects real systems.
References
Thanks for reading. See you in the next lab.

