Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild
Adversaries are actively exploiting web-based Indirect Prompt Injection (IDPI) to manipulate Large Language Models (LLMs) and AI agents. By embedding hidden or obfuscated instructions within benign web content, attackers can coerce AI systems into performing unauthorized actions such as data destruction, SEO poisoning, and bypassing content moderation when the AI processes the webpage.
Authors: Unit 42
Source:Palo Alto Networks
- domain1winofficialsite[[.]]inDomain utilizing IDPI for SEO poisoning, impersonating a popular betting platform.
- domaincblanke2[.]pages[[.]]devWebsite hosting an IDPI payload attempting to execute a Linux file system deletion and a classic fork bomb for Denial of Service.
- domainsplintered[[.]]co[[.]]ukWebsite hosting a critical-severity IDPI script attempting to coerce AI agents into executing database destruction commands.
- urlbuy.stripe[.]com/7sY4gsbMKdZwfx39Sq0oM00Payment processing URL targeted by an IDPI script attempting to force unauthorized donations.
- urlhxxps[:]//reviewerpress[.]com/advertorial-maxvision-can/?lang=enHosted the first observed real-world IDPI attack designed to bypass an AI-based product ad review system.
- urlllm7-landing.pages[.]dev/_next/static/chunks/app/page-94a1a9b785a7305c.jsJavaScript file hosting an IDPI payload that attempts to force an AI agent to initiate an unauthorized 'pro plan' subscription.
Key Takeaways
- Web-based Indirect Prompt Injection (IDPI) is actively being weaponized in the wild, moving beyond theoretical proof-of-concepts.
- Attackers are using IDPI for high-severity and critical impacts, including AI ad review evasion, SEO poisoning, data destruction, and unauthorized transactions.
- Prompt delivery methods heavily utilize visual concealment (e.g., zero-sizing, off-screen positioning), HTML attribute cloaking, and dynamic JavaScript execution.
- Jailbreak techniques rely predominantly on social engineering (85.2%) to bypass AI safeguards, often framing malicious requests as authoritative system overrides.
- Defending against IDPI requires web-scale intent analysis and context-aware parsing, as traditional signature-based scanners cannot reliably distinguish between benign content and obfuscated prompts.
Affected Systems
- Large Language Models (LLMs)
- AI Agents
- Web Browsers with AI Integrations
- Search Engines
- Automated Content-Processing Pipelines
Attack Chain
Attackers embed malicious prompt instructions within benign-looking web pages using techniques like visual concealment (zero-sizing, off-screen positioning), HTML obfuscation, or dynamic JavaScript execution. When an AI agent or LLM processes the web page for routine tasks like summarization or content moderation, it ingests the hidden text alongside the legitimate content. Because the LLM cannot distinguish between untrusted web data and its core instructions, it executes the attacker's payload. This results in the AI agent performing unauthorized actions, such as approving malicious ads, executing destructive system commands, or initiating fraudulent transactions on behalf of the user.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
The article does not provide specific detection rules but emphasizes the need for web-scale intent analysis, prompt visibility assessment, and behavioral correlation to detect IDPI.
Detection Engineering Assessment
EDR Visibility: Low — EDRs monitor endpoint processes and file systems, but IDPI occurs within the context windows of cloud-hosted LLMs or web-based AI agents parsing HTML, which is largely invisible to traditional EDR. Network Visibility: Medium — Network tools and WAFs can inspect HTML for hidden text patterns or base64 encoded prompts, but TLS encryption and dynamic JavaScript rendering complicate reliable detection. Detection Difficulty: Hard — Distinguishing between benign web content and malicious IDPI requires semantic understanding and context-aware parsing. Attackers use heavy obfuscation, multilingual instructions, and dynamic execution to evade standard signature-based scanners.
Required Log Sources
- Web Proxy Logs
- WAF Logs
- LLM Application Audit Logs
- AI Agent Prompt/Response Logs
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Search web proxy or WAF logs for HTML responses containing known prompt injection keywords (e.g., 'IGNORE ALL PREVIOUS INSTRUCTIONS') hidden within zero-sized fonts or off-screen CSS. | WAF/Proxy Logs | Delivery | Low |
| Monitor LLM application logs for sudden shifts in agent behavior, such as unexpected approvals in moderation queues or attempts to execute system-level shell commands. | Application Logs | Execution | Medium |
| Identify web pages containing Base64 encoded strings within data attributes that decode to imperative LLM instructions or system override commands. | Web Proxy Logs | Delivery | Low |
Control Gaps
- Lack of strict separation between instruction and data in LLM context windows.
- Inability of traditional web scanners to parse dynamically rendered or semantically obfuscated prompts.
- Over-privileged AI agents capable of executing system commands or initiating transactions without human-in-the-loop verification.
Key Behavioral Indicators
- HTML elements with
font-size: 0px,opacity: 0, orleft: -9999pxcontaining imperative commands. - Base64 encoded strings in HTML data attributes that decode to LLM instructions.
- Use of zero-width Unicode characters, homoglyphs, or Unicode bi-directional overrides in web text.
- XML/SVG files containing CDATA sections with prompt injection payloads.
False Positive Assessment
- Medium. Legitimate web pages may use hidden text for accessibility (e.g., screen readers) or benign SEO purposes, which could trigger simple CSS-based IDPI detection heuristics. Semantic analysis is required to confirm malicious intent.
Recommendations
Immediate Mitigation
- Implement strict input validation and sanitization for any web content ingested by AI agents.
- Apply the principle of least privilege to AI agents, restricting their ability to execute system commands, access sensitive databases, or initiate financial transactions.
Infrastructure Hardening
- Adopt design-level defenses such as 'spotlighting' to separate untrusted web text from trusted system instructions.
- Utilize newer LLMs hardened with instruction hierarchy and adversarial training to reduce prompt injection susceptibility.
- Implement human-in-the-loop (HITL) verification for critical actions proposed by AI agents.
User Protection
- Deploy advanced URL filtering and browser-based protections to block known IDPI-hosting domains.
- Educate users on the risks of relying solely on AI-generated summaries of untrusted web pages.
Security Awareness
- Train development teams on the OWASP LLM Prompt Injection Prevention guidelines.
- Incorporate IDPI threat modeling into the lifecycle of AI agent deployment and integration.
MITRE ATT&CK Mapping
- T1562.001 - Impair Defenses: Disable or Modify Tools
- T1485 - Data Destruction
- T1499 - Endpoint Denial of Service
- T1566 - Phishing
- T1059.004 - Command and Scripting Interpreter: Unix Shell
Additional IOCs
- Domains:
dylansparks[[.]]com- Website hosting IDPI attempting sensitive information leakage.leroibear[[.]]com- Website identified in telemetry as containing IDPI.myshantispa[[.]]com- Website hosting IDPI attempting review manipulation to force positive reviews.perceptivepumpkin[[.]]com- Website hosting IDPI attempting unauthorized transactions (sending $5,000 to an attacker account).reviewerpressus[.]mycartpanda[[.]]com- Redirect destination for the deceptive scam advertisement linked to the AI ad review bypass.shiftypumpkin[[.]]com- Website hosting IDPI attempting unauthorized transactions.storage3d[[.]]com- Domain hosting IDPI attempting unauthorized transactions.trinca[.]tornidor[[.]]com- Website hosting IDPI attempting recruitment manipulation and benign anti-scraping.turnedninja[[.]]com- Website hosting IDPI attempting to force irrelevant output from AI agents.runners-daily-blog[[.]]com- Website hosting IDPI attempting to force the purchase of running shoes.
- Urls:
ericwbailey[.]website/published/accessibility-preference-settings-information-architecture-and-internalized-ableism- URL hosting IDPI attempting minor resource exhaustion.buy.stripe[.]com/9B600jaQo3QC4rU3beg7e02- Payment processing URL used by websites containing IDPI.paypal[.]me/shiftypumpkin- Payment processing URL used by websites containing IDPI.storage3d[.]com/storage/2009.11- URL hosting IDPI attempting unauthorized transactions.token.llm7[.]io/?subscription=show- OAuth login URL targeted by an IDPI script for forced subscriptions.
- Command Lines:
- Purpose: Attempted Linux file system deletion via IDPI payload | Tools:
rm| Stage: Execution |rm -rf --no-preserve-root - Purpose: Classic fork bomb designed to crash systems by exhausting CPU and process resources | Tools:
bash| Stage: Denial of Service |:(){ :|:& };:
- Purpose: Attempted Linux file system deletion via IDPI payload | Tools: