Pickle in the Middle – Hijacking Vertex AI Model Uploads for Cross-Tenant RCE
A critical vulnerability in the Google Cloud Vertex AI SDK for Python allows attackers to achieve cross-tenant Remote Code Execution (RCE) via bucket squatting. By predicting default staging bucket names and exploiting a lack of ownership verification, attackers can intercept model uploads and inject malicious pickle payloads, leading to the theft of highly privileged service account tokens.
Detection / HunterGoogle
What Happened
Security researchers discovered a major flaw in the Google Cloud Vertex AI software used by developers to build AI models. Because the software used predictable names for temporary storage folders and didn't check who owned them, an attacker could secretly create the folder first. When a developer uploaded their AI model, it would go to the attacker's folder, allowing the attacker to quickly swap it with a malicious version. This could let the attacker steal sensitive data and take control of the AI environment. Developers should update their Vertex AI Python software to version 1.148.0 or later to fix this issue.
Key Takeaways
- A vulnerability in Google Cloud Vertex AI SDK for Python (versions 1.139.0 and 1.140.0) allows for cross-tenant Remote Code Execution (RCE).
- The flaw stems from predictable default staging bucket names and a lack of ownership verification, enabling bucket squatting.
- Attackers can exploit a ~2.5-second race condition using a Cloud Function to replace legitimate uploaded models with malicious pickle payloads.
- Exploitation leads to the exfiltration of highly privileged OAuth tokens, enabling cross-deployment model theft and tenant infrastructure reconnaissance.
- Google patched the vulnerability in SDK version 1.148.0 by adding random UUID salts to bucket names and implementing ownership verification.
Affected Systems
- Google Cloud Vertex AI SDK for Python (google-cloud-aiplatform) versions 1.139.0 and 1.140.0
Attack Chain
The attacker predicts the victim's default Vertex AI staging bucket name and preemptively creates it in their own Google Cloud project, granting public read/write access. The attacker deploys a Cloud Function triggered by object creation in this bucket. When the victim uploads a legitimate model using the vulnerable SDK, the artifacts are sent to the attacker's bucket. Within a 2.5-second window, the Cloud Function replaces the legitimate model with a malicious pickle payload. Upon deployment, the victim's serving container deserializes the poisoned model, executing arbitrary code that exfiltrates the container's highly privileged OAuth token to the attacker.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
The article does not provide specific detection rules or queries, focusing instead on the vulnerability mechanism and patching.
Detection Engineering Assessment
EDR Visibility: None — The attack occurs entirely within Google Cloud infrastructure (GCS buckets, Cloud Functions, and Vertex AI serving containers), which are typically outside the scope of traditional endpoint EDR. Network Visibility: Low — Traffic flows between Google Cloud services (SDK to GCS, GCS to Cloud Function, Vertex AI to GCS). Only the final exfiltration webhook might traverse customer-visible networks if egress logging is enabled on the serving container. Detection Difficulty: Hard — The attack exploits legitimate SDK behavior and race conditions in cloud storage. Distinguishing a squatted bucket from a legitimate one requires cross-project visibility, which victims typically do not have.
Required Log Sources
- Google Cloud Audit Logs
- GCS Data Access Logs
- VPC Flow Logs
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Consider hunting for Vertex AI model deployments where the staging bucket resides in a different Google Cloud project than the deploying identity. | Google Cloud Audit Logs | Initial Access | Low |
| Consider monitoring for unexpected outbound network connections from Vertex AI serving containers, which may indicate credential exfiltration. | VPC Flow Logs | Exfiltration | Medium |
Control Gaps
- Lack of cross-project bucket ownership validation in older SDK versions
- Absence of integrity checks (e.g., hash verification) between model upload and deployment
Key Behavioral Indicators
- Rapid modification of model.joblib files in GCS buckets immediately following creation
- Vertex AI serving containers querying the GCE metadata server for OAuth tokens and sending them to external IPs
False Positive Assessment
- Low
Recommendations
Immediate Mitigation
- Verify against your organization's incident response runbook and team escalation paths before acting.
- Consider upgrading the google-cloud-aiplatform Python SDK to version 1.148.0 or later across all development and CI/CD environments.
- Evaluate explicitly defining the staging_bucket parameter with a known, internally controlled Cloud Storage URI when uploading models via the SDK.
Infrastructure Hardening
- Consider implementing organizational policies to restrict IAM permissions, ensuring that 'allAuthenticatedUsers' cannot be granted access to sensitive GCS buckets.
- If supported by your architecture, evaluate enforcing VPC Service Controls to restrict data exfiltration from Vertex AI serving containers to unauthorized external endpoints.
User Protection
- Consider auditing existing Vertex AI deployments for models staged in unrecognized or external GCS buckets.
Security Awareness
- Consider educating data science and ML engineering teams on the risks of insecure deserialization (e.g., Python pickle) and the importance of specifying explicit storage locations.
MITRE ATT&CK Mapping
- T1059.006 - Command and Scripting Interpreter: Python
- T1565.002 - Data Manipulation: Stored Data Manipulation
- T1552.005 - Unsecured Credentials: Cloud Instance Metadata API
- T1530 - Data from Cloud Storage Object