Securing AI Systems | TryHackMe Write-up
Complete walkthrough for Securing AI Systems TryHackme room. Map AI architecture, identify OWASP/ATLAS attack surfaces, and apply secure design to trust boundaries.
This is my write-up for the TryHackMe room on Securing AI Systems. Written in 2026, I hope this write-up helps others learn and practice cybersecurity.
Task 1: Introduction
This section introduces TryAssist, an AI-powered code review assistant that fundamentally alters a system's attack surface. The transition to AI necessitates understanding new architectural components and trust boundaries, highlighting that traditional security frameworks are no longer sufficient to stop confidential data leaks or unauthorized actions.
I'm ready to learn about securing AI systems!
No answer needed
Task 2: Anatomy of an AI System
AI-augmented applications replace structured inputs and deterministic processing with natural language and probabilistic models. TryAssist consists of nine core components (like the API Gateway, Vector Store, and Tool Layer) and introduces five critical trust boundaries where data moves between security contexts, each representing a potential point of failure.
What layer in an AI system is responsible for combining the system prompt, user input, and retrieved context before sending it to the model?
Prompt Construction
In the TryAssist architecture, what boundary does LLM output cross when it triggers a database query?
LLM-to-tools
Task 3: The AI Attack Surface
Security professionals rely on structured frameworks to classify and respond to AI vulnerabilities. The OWASP LLM Top 10 categorizes the most critical vulnerabilities, MITRE ATLAS maps the specific tactics and techniques adversaries use to exploit them, and the NIST AI RMF provides the organizational governance structure to manage these risks systemically.
Which OWASP LLM Top 10 (2025) category covers the risk of LLM output being used to execute SQL injection against a backend database?
LLM05
What is the name of the MITRE knowledge base specifically designed for adversary tactics and techniques against AI and ML systems?
ATLAS
Task 4: System-Level Threats
This task breaks down five key architectural vulnerabilities from the OWASP LLM Top 10. These include Unbounded Consumption (LLM10), System Prompt Leakage (LLM07), Improper Output Handling (LLM05), Excessive Agency (LLM06), and Sensitive Information Disclosure (LLM02). Together, these threats compromise the confidentiality, integrity, and availability (CIA triad) of the entire system.
The Air Canada chatbot incident is frequently cited as an LLM05 example, but OWASP LLM Top 10 (2025) classifies it under which category?
LLM09
What are the three dimensions of excessive agency?
excessive functionality, excessive permissions, excessive autonomy
A user extracts internal API endpoints from an AI assistant's system prompt. Which OWASP LLM Top 10 (2025) category does this fall under?
LLM07
An attacker sends thousands of maximum-length requests to an LLM API to generate a large bill. Which OWASP LLM Top 10 (2025) category covers this?
LLM10
Task 5: Secure Design Patterns
Securing an AI system requires implementing robust controls during the design phase rather than retrofitting them later. Essential patterns include Defense in Depth across all trust boundaries, enforcing Least Privilege for AI tool access, strict Input and Output Validation to prevent malicious execution, and integrating continuous MLSecOps monitoring.
What security principle states that every AI component should have the minimum permissions required to perform its function?
Least Privilege
What practice integrates security into the machine learning lifecycle, covering monitoring, observability, and incident response?
MLSecOps
Task 6: Auditing TryAssist: A Conversation with the System
Direct interaction with an AI agent is a crucial step in pre-deployment security auditing. By systematically prompting the system about its tools, permissions, autonomy, instructions, and data retention policies, security architects can uncover hidden architectural risks and misconfigurations that static documentation often misses.
Figure 1: Discovering the tools and functions available to the AI agent.
The initial audit step involves identifying the external functions the agent can trigger. TryAssist reveals access to code repositories and database systems.
Figure 2: Inquiring about the agent's database permissions.
Probing for permission levels reveals that the agent operates with highly privileged access, such as db_admin, which violates the principle of least privilege.
Figure 3: Determining the agent's level of autonomy in code management.
Investigating operational autonomy shows that TryAssist can perform critical actions, like merging pull requests, without requiring a human-in-the-loop for approval.
Figure 4: Extracting the system prompt and core instructions.
By requesting its core instructions, the agent may leak its system prompt, revealing internal API endpoints and logic that could be leveraged by an attacker.
Figure 5: Analyzing data retention and conversation logging.
Understanding how the system stores data is vital for privacy compliance. The audit reveals that conversation logs are stored indefinitely.
During the audit, TryAssist describes one action it takes automatically, without requiring human approval. What is that action?
merge pull requests
TryAssist confirmed that once it reviews and approves a pull request, it automatically merges the PR directly into the target branch. Notably, it explicitly stated that no human approval step is involved in this process.
What database role does TryAssist report operating under?
db_admin
TryAssist reports that it operates as db_admin with full DDL privileges on the production database. This represents a significant security risk by ignoring the principle of least privilege in favor of broader functionality.
TryAssist logs all conversations without applying which security control?
Figure 6: Identifying the absence of PII filtering in conversation logs.
PII filtering
TryAssist admits that it captures and logs entire conversations in plaintext without removing Personally Identifiable Information (PII), creating a major data privacy risk.
Task 7: Conclusion
Securing an AI system requires looking beyond the model to protect the broader architecture. Integrating frameworks like OWASP, MITRE ATLAS, and NIST AI RMF allows organizations to build layered defenses (MLSecOps, least privilege, boundary validation) that address entirely new threat vectors unseen in traditional application security.
I understand the foundations of securing AI systems!
No answer needed
Thanks for reading. See you in the next lab.








