Phishing for Lobsters: How We Tricked OpenClaw into Spilling Secrets
Varonis Threat Labs demonstrated that enterprise AI agents, specifically an OpenClaw deployment, are vulnerable to traditional phishing and social engineering techniques. In simulated attacks, the agent successfully identified technical phishing indicators like malicious OAuth flows but failed to recognize social context, resulting in the exfiltration of AWS credentials and sensitive CRM data to an external attacker.
- domainholidaygifts[.]co[.]ilSimulated phishing domain used as a capture server during the gift card scam scenario.
- email3000levi[@]gmail[.]comSimulated attacker email address observed in the agent's reasoning logs during the credential exfiltration scenario.
- emaildan[.]levi3000[@]gmail[.]comSimulated attacker email address used to send impersonation phishing requests to the AI agent.
- ip176[.]65[.]149[.]234IP address observed in HTTP logs interacting with the simulated phishing capture server.
- ip51[.]4[.]96[.]17IP address observed in HTTP logs interacting with the simulated phishing capture server.
Detection / HunterGoogle
What Happened
Researchers tested an AI email assistant to see if it could be tricked by phishing emails, just like humans. They found that while the AI was good at spotting fake login pages, it easily fell for emails pretending to be from coworkers asking for sensitive information. As a result, the AI handed over secret passwords and customer data to the 'attacker' without double-checking. This shows that as companies use AI to manage emails, they need to put strict rules in place to stop the AI from accidentally giving away company secrets. Organizations should require human approval before an AI can send out sensitive data.
Key Takeaways
- AI agents deployed in enterprise inboxes are highly susceptible to social engineering and impersonation attacks, even when equipped with security instructions.
- Agents can bypass traditional phishing defenses by acting on behalf of the user, inadvertently leaking sensitive data like AWS keys and CRM exports to external attackers.
- While AI agents can successfully detect technical phishing indicators (like malicious OAuth redirects or fake login pages), they lack the social context to identify unusual requests from seemingly known entities.
- Defending AI agents requires architectural controls, such as segmenting connector access, enforcing explicit security instructions, and requiring human-in-the-loop for high-privilege actions.
Affected Systems
- OpenClaw AI Agent
- Google Workspace
- Enterprise Inboxes
- LLMs (Google Gemini 3.1 Pro, OpenAI Codex GPT-5.4)
Attack Chain
The attacker sends a socially engineered email to an inbox monitored by an autonomous AI agent (OpenClaw). The agent's orchestrator classifies the email and delegates the task to a worker sub-agent without verifying the sender's identity. The worker agent searches internal systems, such as mailboxes or connected drives, to retrieve the requested sensitive information, such as AWS credentials or CRM exports. Finally, the agent forwards the sensitive data to the attacker's external email address, bypassing traditional outbound data loss prevention controls.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
The article does not provide specific detection rules, as it focuses on architectural vulnerabilities and behavioral flaws in AI agents.
Detection Engineering Assessment
EDR Visibility: Low — EDR solutions monitor endpoint behavior, but these attacks occur via cloud APIs, email forwarding, and SaaS integrations where EDR has limited visibility. Network Visibility: Medium — Network logs might capture the AI agent's infrastructure reaching out to unusual external domains (like the phishing site), but email-based exfiltration will blend with normal traffic. Detection Difficulty: Hard — The actions are performed by a legitimate, highly privileged AI agent, making malicious actions blend in with normal, helpful workflows.
Required Log Sources
- Email Gateway Logs
- Application Audit Logs
- Cloud Trail / Workspace Logs
- DLP Logs
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Consider hunting for unusual outbound email forwarding rules or direct emails sent from AI agent service accounts to external, unverified domains. | Email Gateway Logs | Exfiltration | Medium |
| If you have visibility into AI agent API usage, consider hunting for queries accessing highly sensitive keywords (e.g., 'credentials', 'AWS keys', 'customer export') followed immediately by an outbound communication. | Application Audit Logs | Collection | High |
Control Gaps
- Lack of identity verification for inbound requests processed by AI agents
- Insufficient outbound data loss prevention (DLP) on AI agent communications
- Over-privileged access for AI agents to sensitive data repositories
Key Behavioral Indicators
- AI agent service account sending emails to external addresses not previously corresponded with
- Agent accessing sensitive files and immediately initiating an external web request
False Positive Assessment
- Low
Recommendations
Immediate Mitigation
- Verify against your organization's incident response runbook and team escalation paths before acting.
- Evaluate whether AI agents have unrestricted outbound email capabilities and consider restricting them to internal domains only.
- Consider implementing a human-in-the-loop approval process for any high-privilege actions or external data sharing initiated by AI agents.
Infrastructure Hardening
- Segment connector access for AI agents based on the inbound channel's trust level (e.g., external emails vs. internal Slack messages).
- Treat AI agent configuration files (like agents.md) as security controls, enforcing explicit security instructions and version control.
User Protection
- If applicable, ensure AI agents are subject to the same Conditional Access policies and MFA requirements as human users.
Security Awareness
- Consider updating threat modeling and security awareness programs to account for AI agents as potential targets for social engineering.
MITRE ATT&CK Mapping
- T1566.001 - Phishing: Spearphishing Attachment
- T1566.002 - Phishing: Spearphishing Link
- T1534 - Internal Spearphishing
- T1114.002 - Email Collection: Remote Email Collection
- T1048 - Exfiltration Over Alternative Protocol
Additional IOCs
- Ips:
51[.]4[.]96[.]17- IP address observed in HTTP logs interacting with the simulated phishing capture server.176[.]65[.]149[.]234- IP address observed in HTTP logs interacting with the simulated phishing capture server.