Secure Homegrown AI Agents with CrowdStrike Falcon AIDR and NVIDIA NeMo Guardrails
CrowdStrike has announced the integration of Falcon AI Detection and Response (AIDR) with NVIDIA NeMo Guardrails to secure enterprise AI agents against runtime attacks. The solution provides programmable guardrails to prevent prompt injection, data exposure, and unauthorized actions by applying over 75 built-in classification rules to LLM interactions.
Source:CrowdStrike
Key Takeaways
- CrowdStrike Falcon AIDR now integrates with NVIDIA NeMo Guardrails (v0.20.0) to secure enterprise AI agents.
- The solution protects against prompt injection, jailbreaks, PII/PHI exposure, and malicious entity execution.
- Falcon AIDR uses an OpenAI-compatible API to apply policies (detect, block, redact, encrypt, transform) to message arrays.
- The platform includes over 75 built-in classification rules and supports custom data classification for tailored security.
Affected Systems
- AI Agents
- LLM Applications
- Agentic Workflows
Attack Chain
Threat actors target AI applications by submitting malicious inputs designed to trigger prompt injection or jailbreak conditions. Once the AI agent's constraints are bypassed, the agent may be manipulated into executing unauthorized actions, exposing sensitive PII/PHI, or interacting with malicious infrastructure. To mitigate this, Falcon AIDR intercepts the OpenAI-compatible message array via API, applying policies to detect, redact, or block malicious content before the LLM processes the request or returns the output.
Detection Availability
- YARA Rules: No
- Sigma Rules: No
- Snort/Suricata Rules: No
- KQL Queries: No
- Splunk SPL Queries: No
- EQL Queries: No
- Other Detection Logic: No
- Platforms: CrowdStrike Falcon AIDR, NVIDIA NeMo Guardrails
Detection capabilities are built directly into CrowdStrike Falcon AIDR and NVIDIA NeMo Guardrails via API policies and built-in classification rules. No standalone query logic is provided in the article.
Detection Engineering Assessment
EDR Visibility: None — The threats described (prompt injection, LLM jailbreaks) occur at the application and API layer, not the OS or endpoint layer where traditional EDR operates. Network Visibility: Medium — Network sensors might catch plaintext API calls to LLMs or outbound connections to malicious domains triggered by agents, but TLS encryption limits deep packet inspection of the prompts themselves. Detection Difficulty: Hard — Detecting prompt injection and jailbreaks requires semantic understanding of natural language, which traditional signature-based tools cannot reliably parse.
Required Log Sources
- Application Logs
- API Gateway Logs
- LLM Interaction Logs
Hunting Hypotheses
| Hypothesis | Telemetry | ATT&CK Stage | FP Risk |
|---|---|---|---|
| Users or automated scripts are submitting anomalous, highly complex, or repetitive prompts designed to bypass LLM system instructions. | LLM Application/API Logs | Execution | High |
Control Gaps
- Traditional WAFs
- Endpoint Detection and Response (EDR)
- Signature-based IPS
Key Behavioral Indicators
- Unexpected role-switching in prompt inputs
- Attempts to output system prompts
- Presence of known jailbreak phrases (e.g., 'Do Anything Now')
False Positive Assessment
- Medium
Recommendations
Immediate Mitigation
- Implement input validation and sanitization for all user-supplied data interacting with LLMs.
- Deploy AI-specific guardrails (like NeMo Guardrails or Falcon AIDR) to monitor and filter LLM inputs and outputs.
Infrastructure Hardening
- Enforce least privilege access for AI agents, limiting their ability to execute sensitive API calls or access restricted databases.
- Route all AI agent traffic through monitored and filtered API gateways.
User Protection
- Redact or encrypt PII/PHI before it is processed by external LLM APIs.
- Implement human-in-the-loop (HITL) approvals for high-risk agentic actions.
Security Awareness
- Train developers on the risks of prompt injection and insecure LLM output handling.
- Establish clear policies for what data is permissible to share with internal and external AI tools.
MITRE ATT&CK Mapping
- T1190 - Exploit Public-Facing Application
- T1059 - Command and Scripting Interpreter