2026-05-134 minhigh

The Top 10 Threats Related to Agent Skills

Anthropic's new 'Agent Skills' feature, which uses progressive disclosure to manage AI agent context windows, introduces a novel attack surface. The article outlines the top 10 critical threats to this ecosystem, including prompt injection, supply chain manipulation, and unauthorized code execution, highlighted by the recent OpenClaw malware incident.

Conf:mediumAnalyzed:2026-03-21reports

ActorsOpenClaw

Key Takeaways

Anthropic's 'Agent Skills' introduce a new attack surface through progressive disclosure of reusable playbooks (SKILL.md).
The 'OpenClaw' incident demonstrated this risk, where a popular Skill acted as malware to exfiltrate SSH keys, crypto wallets, and establish a reverse shell.
Top threats include indirect prompt injection, supply chain manipulation, unexpected code execution, and memory abuse.
Security practitioners must treat Skills as a distinct trust boundary, apply least-privilege access, and restrict Skills to allowlisted registries.

Affected Systems

AI Agents
Anthropic Agent Skills
Model Context Protocol (MCP)

Attack Chain

Attackers can exploit AI Agent Skills through vectors such as supply chain manipulation (typosquatting) or dynamic discovery to inject malicious SKILL.md files. Once the agent activates the Skill via progressive disclosure, embedded malicious instructions or helper scripts execute. This can lead to indirect prompt injection, unauthorized data access (e.g., .env or .ssh folders), exfiltration of sensitive data, or establishing a reverse shell, as demonstrated in the OpenClaw incident.

Detection Availability

YARA Rules: No
Sigma Rules: No
Snort/Suricata Rules: No
KQL Queries: No
Splunk SPL Queries: No
EQL Queries: No
Other Detection Logic: No

No specific detection rules or queries are provided in the article.

Detection Engineering Assessment

EDR Visibility: Medium — EDR can detect the execution of malicious helper scripts (Python, JS, Shell) and reverse shells spawned by the AI agent process. Network Visibility: Medium — Network monitoring can detect reverse shell connections or exfiltration of data to attacker-controlled servers. Detection Difficulty: Hard — Distinguishing legitimate AI agent actions from malicious Skill executions requires deep context into the agent's expected behavior and strict boundary monitoring.

Required Log Sources

Process Creation Logs
Network Connection Logs
File Access Logs

Hunting Hypotheses

copy:

Hypothesis	Telemetry	ATT&CK Stage	FP Risk
Look for AI agent processes spawning unexpected interactive shells or scripting interpreters (Python, Node.js) that initiate outbound network connections.	Process Creation, Network Connections	Execution	Medium
Monitor for AI agent processes accessing sensitive local files such as .env files, SSH keys, or browser cookies.	File Access	Credential Access	Low

Control Gaps

Lack of strict validation between Skill descriptions and runtime behavior
Implicit trust chains between Skills (Skill-to-Skill data flow)
Dynamic Skill discovery bypassing load-time detection

Key Behavioral Indicators

Agent process spawning reverse shells
Unexpected access to .ssh or .env directories by agent processes
High-latency API calls or premium-tier model queries causing resource exhaustion

False Positive Assessment

Recommendations

Immediate Mitigation

Scan existing Agent Skills for malicious code or unexpected dependencies before use.
Review and restrict permissions granted to AI agents and their associated Skills.

Infrastructure Hardening

Isolate Skill memory and execution environments using sandboxing.
Restrict Skill sourcing to allowlisted, trusted registries only.
Structure and validate Skill-to-Skill data flows.

User Protection

Apply least-privilege access to each Skill.
Require human-in-the-loop confirmation for high-impact actions initiated by agents.

Security Awareness

Educate development teams on the risks of progressive disclosure and prompt injection in AI agent architectures.
Incorporate AI Skill threat modeling (e.g., the Top 10 list) into secure SDLC processes.

MITRE ATT&CK Mapping

T1195.002 - Supply Chain Compromise: Compromise Software Supply Chain
T1059 - Command and Scripting Interpreter
T1552.004 - Credentials in Files: Private Keys
T1048 - Exfiltration Over Alternative Protocol

Stay currentSubscribe via RSS