Machine Learning Operations: Yesterday, Today, and Tomorrow
Akamai details its internal Machine Learning Operations (MLOps) platform, highlighting the transition from manual model management to a standardized, Kubeflow-based infrastructure. The platform enhances real-time security detections by streamlining model evaluation, tuning, and deployment, and is currently evolving to support LLMOps and AgentOps for generative AI applications.
Source:Akamai
Key Takeaways
- Akamai utilizes a standardized MLOps platform based on Kubeflow to manage security detection models and reduce false positives/negatives.
- The platform integrates technologies like MLflow, Apache Spark, Ray, and Kafka to streamline the ML lifecycle from experimentation to production.
- Workflows are structured into three core stages: Trigger, Analysis, and Control, enabling automated and manual model retraining.
- Akamai is evolving its MLOps infrastructure to support LLMOps and AgentOps for managing generative AI and autonomous agents.
Attack Chain
N/A. This article discusses internal MLOps architecture and infrastructure at Akamai, not a cyberattack or threat actor methodology.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
No detection rules are provided as this is an informational article about MLOps architecture.
Detection Engineering Assessment
EDR Visibility: None — The article discusses backend ML infrastructure, not endpoint activity or threat execution. Network Visibility: None — The article discusses backend ML infrastructure, not network-level attack indicators. Detection Difficulty: N/A — There is no threat or attack methodology described to detect.
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Monitor for unauthorized access or anomalous configuration changes to MLOps infrastructure components such as Kubeflow, MLflow, or HashiCorp Vault to prevent model tampering. | Application logs, IAM logs, Kubernetes audit logs | Defense Evasion | High |
False Positive Assessment
- Low
Recommendations
Immediate Mitigation
- N/A
Infrastructure Hardening
- Consider adopting standardized MLOps frameworks like Kubeflow to manage and secure machine learning lifecycles.
- Implement secure secret management (e.g., HashiCorp Vault) and strict IAM controls for all ML and AI infrastructure.
User Protection
- N/A
Security Awareness
- Educate security and data science teams on the importance of structured MLOps, LLMOps, and AgentOps for maintaining reliable AI-driven security detections.