Analyzing the Current State of AI Use in Malware
Unit 42 analyzed two malware samples leveraging Large Language Models (LLMs) for remote decision-making. One is a .NET infostealer using GPT-3.5-Turbo for superficial 'AI theater', while the other is a Golang dropper that uses GPT-4 to evaluate system telemetry and determine if the environment is safe to deploy a Sliver payload.
Authors: Unit 42
Source:Palo Alto Networks
- sha256052d5220529b6bd4b01e5e375b5dc3ffd50c4b137e242bbfb26655fd7f475ac6Golang malware dropper that uses GPT-4 for environment safety assessment before dropping Sliver.
- sha2561b6326857fa635d396851a9031949cfdf6c806130767c399727d78a1c2a0126c.NET-based infostealer utilizing GPT-3.5-Turbo for AI theater (Sample 1 of 3).
- urlhxxp[:]//localhost:3002/crypto-dataDefault C2 URL used by the .NET infostealer to exfiltrate stolen data in JSON format.
Key Takeaways
- Threat actors are actively experimenting with Large Language Models (LLMs) like OpenAI's GPT-3.5-Turbo and GPT-4 for remote decision-making in malware.
- A .NET infostealer was observed using GPT-3.5-Turbo for 'AI Theater', generating evasion and obfuscation techniques that are logged but not actually executed by the malware.
- A Golang dropper uses GPT-4 to evaluate system telemetry (processes, uptime, AV presence) to dynamically decide whether it is safe to execute a Sliver payload.
- Using AI for environment assessment replaces traditional hardcoded allow/deny lists, potentially complicating static analysis of evasion triggers for defenders.
Affected Systems
- Windows
Attack Chain
The .NET infostealer executes, collects system data (browser cookies, file listings), and makes API calls to OpenAI to generate evasion techniques, ultimately attempting to exfiltrate data to a C2 server. In a separate campaign, a Golang dropper collects system telemetry (processes, AV, uptime) and sends it to GPT-4 via API. GPT-4 evaluates the OPSEC risk and returns a JSON response; if deemed safe, the dropper decrypts Donut shellcode and executes a Sliver payload.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
The article does not provide specific detection rules, but mentions that Palo Alto Networks products (Cortex XDR, Advanced WildFire) provide coverage against these threats.
Detection Engineering Assessment
EDR Visibility: High — EDR solutions have high visibility into the system discovery commands, process enumeration, file writes (opsec.log, victim_logs.txt), and the eventual execution of Donut shellcode or Sliver payloads. Network Visibility: Medium — Connections to OpenAI APIs are TLS encrypted and common in enterprise environments, making network-level detection of the malicious prompts difficult without SSL inspection or process-to-network correlation. Detection Difficulty: Moderate — While the underlying malware behaviors (discovery, dropping Sliver) are standard and detectable, the use of legitimate OpenAI APIs for decision-making blends in with legitimate developer or enterprise AI tool traffic.
Required Log Sources
- Process Creation
- File Creation
- Network Connections
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Unknown or untrusted processes making frequent outbound HTTPS connections to OpenAI API endpoints, especially if preceded by extensive system and process discovery. | Network Connections, Process Creation | Command and Control | High |
| Creation of specific log files like 'opsec.log' or 'victim_logs.txt' containing AI-related keywords or JSON telemetry structures. | File Creation | Execution | Low |
Control Gaps
- Network filtering of OpenAI APIs (often allowed by default in modern environments)
Key Behavioral Indicators
- Unusual processes connecting to OpenAI APIs
- Creation of opsec.log with OPSEC decision JSON
- Donut shellcode execution patterns following API calls
False Positive Assessment
- Medium
Recommendations
Immediate Mitigation
- Block the identified SHA256 hashes in EDR and AV solutions.
Infrastructure Hardening
- Implement strict application control to prevent execution of unknown binaries.
- Monitor and potentially restrict API access to public LLM services from non-developer endpoints or servers.
User Protection
- Deploy behavioral EDR rules to catch Sliver framework execution and Donut shellcode loading.
Security Awareness
- Educate SOC analysts on the emerging trend of malware utilizing legitimate LLM APIs for environment assessment and C2.
MITRE ATT&CK Mapping
- T1497.001 - Virtualization/Sandbox Evasion: System Checks
- T1082 - System Information Discovery
- T1057 - Process Discovery
- T1071.001 - Application Layer Protocol: Web Protocols
- T1041 - Exfiltration Over C2 Channel
Additional IOCs
- File Hashes:
02ce798981fb2aa68776e53672a24103579ca77a1d3e7f8aaeccf6166d1a9cc6(sha256) - .NET-based infostealer utilizing GPT-3.5-Turbo (Sample 2 of 3).7c7b7b99f248662a1f9aea1563e60f90d19b0ee95934e476c423d0bf373f6493(sha256) - .NET-based infostealer utilizing GPT-3.5-Turbo (Sample 3 of 3).
- File Paths:
victim_logs.txt- Log file written to the victim's desktop directory by the .NET infostealer containing LLM-generated evasion techniques.opsec.log- Log file written to disk by the Golang dropper detailing the AI-gated execution decisions.