2026-05-133 minlow

From Narrative to Knowledge Graph | LLM-Driven Information Extraction in Cyber Threat Intelligence

SentinelLabs explores the use of Large Language Models (LLMs) to automate the extraction of indicators of compromise (IOCs) and contextual data from Cyber Threat Intelligence (CTI) narratives. The research demonstrates that LLMs can accurately parse unstructured reports into structured knowledge graphs, significantly reducing processing time while highlighting the importance of custom data models, prompt optimization, and evidence-grading frameworks.

Conf:highAnalyzed:2026-03-17reports

Authors:

Key Takeaways

LLMs can significantly accelerate CTI extraction, achieving an average 18.3x speed-up compared to manual human processing.
A three-phase workflow (Sanitization, LLM Extraction, Knowledge Graph Assembly) effectively transforms narrative CTI into structured, machine-readable data.
Custom data models and carefully designed LLM prompts with strict evidence-grading scales are crucial for accurate extraction and minimizing hallucinations.
LLM ensembling should be applied selectively based on task-specific error correlation and disagreement rates to be effective.
Allowing LLMs an explicit abstention option (e.g., 'None') when evidence is insufficient reduces false discovery rates and improves overall output reliability.

Attack Chain

N/A - This article discusses a methodology for extracting Cyber Threat Intelligence using Large Language Models and does not detail a specific cyberattack chain.

Detection Availability

YARA Rules: No
Sigma Rules: No
Snort/Suricata Rules: No
KQL Queries: No
Splunk SPL Queries: No
EQL Queries: No
Other Detection Logic: No

No specific detection rules are provided as this is a research article on CTI extraction methodology.

Detection Engineering Assessment

EDR Visibility: None — The article discusses CTI processing methodologies, not endpoint behaviors. Network Visibility: None — The article focuses on text analysis of CTI reports, not network traffic. Detection Difficulty: N/A — Not applicable to this research paper.

Hunting Hypotheses

copy:

Hypothesis	Telemetry	ATT&CK Stage	FP Risk
Threat actors may utilize newly established infrastructure for Command and Control (T1071) that is identified via automated CTI extraction; retroactively hunting for these extracted indicators in network telemetry can reveal previously undetected intrusions.	DNS logs, Proxy logs, Network flow data	Command and Control	Medium

False Positive Assessment

Recommendations

Immediate Mitigation

N/A

Infrastructure Hardening

Integrate automated CTI extraction pipelines with Threat Intelligence Platforms (TIPs) to rapidly deploy blocking rules for newly identified adversary infrastructure.

User Protection

N/A

Security Awareness

Consider integrating LLM-driven information extraction tools into CTI workflows to accelerate the processing of threat intelligence reports.
Establish clear evidence-grading scales and custom data models when using LLMs for automated IOC extraction to minimize false positives and hallucinations.
Train CTI analysts on the limitations and non-deterministic nature of LLMs, emphasizing the need for human-in-the-loop verification for ambiguous threat intelligence.

Stay currentSubscribe via RSS