The Cortex Data Lake (CDL) serves as the central logging backbone for Palo Alto Networks' Strata firewalls, Prisma Access, and Cloud NGFW. An AI integration connects at the API layer, querying structured log data—including traffic, threat, URL filtering, and WildFire submission logs—without the constraints of the native UI. This allows for bulk extraction of indicators, trend analysis across months of data, and the creation of custom detection models using the lake's historical records as training data. The primary surfaces are the /loggingservice/services/logquery/v1 and /loggingservice/services/dashboard/v1 endpoints, which provide programmatic access to the normalized log schema.
Integration
AI Integration for Palo Alto Cortex Data Lake API

Where AI Fits in the Cortex Data Lake Architecture
Integrating AI with the Cortex Data Lake API unlocks advanced threat hunting, bulk analysis, and custom reporting by treating the data lake as a high-fidelity, long-term source of truth.
A production implementation typically involves a scheduled job or event-driven service that executes XQL-like queries via the CDL API, streams the results to a processing queue, and feeds them into an AI pipeline. High-value use cases include:
- Retrospective Threat Hunting: Querying for specific file hashes, domains, or user-agent strings across 12+ months of logs to identify past compromises.
- Bulk IOC Enrichment: Extracting all external IPs or domains from a specified time range to enrich with threat intelligence and score for risk.
- Custom Behavioral Baselining: Using months of traffic log data to train a model on normal network patterns for specific subnets or user groups, then flagging deviations.
- Compliance Evidence Automation: Programmatically gathering logs to demonstrate control effectiveness for audits (e.g., "show all blocked high-risk URL attempts for the finance department in Q1").
Governance and rollout require careful planning. API calls are metered, so queries must be optimized for time ranges and field selection to manage cost. A rollout should start with read-only, non-production queries to validate data completeness and schema understanding. Since CDL contains sensitive network data, the AI service must operate under strict RBAC, and any extracted data should be processed within a secure enclave. Inference Systems architects these integrations with a focus on idempotent, auditable query jobs that log their actions back to CDL or a SIEM, ensuring the AI workflow itself is transparent and secure.
Key API Surfaces for AI Integration
Query API: The Core Hunting Surface
The Query API (/loggingservice/services/v2/loggingservice/query) is the primary interface for AI-driven threat hunting and historical analysis. It allows you to execute XQL queries against the vast log repository stored in Cortex Data Lake (CDL). This is where you build AI agents that can answer complex, multi-faceted questions about past security events.
Key AI Use Cases:
- Bulk IOC Extraction: Automatically query for indicators across all logs (e.g.,
| filter action = "block") to populate threat intelligence platforms. - Trend Analysis & Reporting: Use AI to generate hypotheses and craft XQL queries to identify attack trends, top talkers, or policy misconfigurations over weeks or months of data.
- Custom Detection Validation: Test the efficacy of new detection rules by querying historical data to see if they would have triggered on past incidents.
Implementation Note: AI workflows here are asynchronous. Your integration must handle job creation, polling for results, and parsing the potentially large JSONL response payloads for downstream analysis.
High-Value AI Use Cases for Cortex Data Lake
Cortex Data Lake provides a unified repository for logs from Palo Alto Networks firewalls, Prisma Access, and Cortex XDR. These pages detail how to build AI applications that query this API directly, enabling advanced analysis, bulk data extraction, and custom reporting beyond the native UI's capabilities.
Bulk IOC & Threat Hunting Query Generation
Automate the creation and execution of complex XQL queries against the Data Lake API for threat hunting campaigns. An AI agent can translate natural language requests (e.g., "Find all internal hosts that communicated with known C2 domains in the last 90 days") into valid XQL, handle pagination for large result sets, and summarize findings. This moves hunting from manual, iterative query building to a guided, scalable process.
Custom Security & Compliance Reporting
Generate scheduled, bespoke reports by querying the Data Lake API for data not readily available in standard dashboards. Use AI to define report parameters, execute the necessary XQL queries to aggregate data (e.g., application usage by department, geo-blocked traffic trends, SSL/TLS cipher analysis), and format the output into PDFs, slides, or CSV files for stakeholders. This automates manual data pulls for audit and operational reviews.
Long-Term Attack Pattern & TTP Analysis
Leverage the extended data retention in Cortex Data Lake to train custom ML models or perform retrospective analysis. An AI workflow can periodically extract months of log data to identify subtle, slow-burn attack patterns (like low-and-slow data exfiltration or periodic beaconing) that are invisible in short-term analysis. This provides a historical baseline for detecting advanced persistent threats (APTs).
Data Enrichment for External SIEM/SOAR
Use the Data Lake API as a high-fidelity source to enrich incidents in a primary SIEM (like Splunk or Microsoft Sentinel) or SOAR platform. An AI agent can be triggered by an alert in the external system, query the Data Lake for relevant raw logs and session details to provide deeper context (e.g., full application identification, user mapping, threat content), and attach this enriched data to the incident record for faster triage.
Anomaly Detection on Network Meta-Features
Go beyond signature-based alerts by analyzing aggregated log metadata. An AI model can consume daily summaries from the Data Lake API—such as unique destination counts per source, bytes transferred per application, or session duration variances—to establish behavioral baselines for networks, users, and applications. It then flags significant deviations that may indicate compromised accounts, insider threats, or policy violations.
Automated Policy Optimization & Clean-up
Analyze firewall rule hit counts and application usage data from the Data Lake to recommend security policy improvements. An AI system can identify rarely-used rules (potential for cleanup), rules consistently blocking legitimate business traffic (needing adjustment), and shadow IT applications communicating on non-standard ports. This provides data-driven insights for network and security architects to harden the environment.
Example AI-Driven Workflows
These workflows demonstrate how to connect AI models and agents to the Palo Alto Cortex Data Lake API to automate threat hunting, enrich investigations, and generate custom intelligence outside the native UI. Each flow is triggered by a specific operational need and leverages the API's bulk query capabilities.
Trigger: A new threat intelligence report is published containing 500+ IOCs (IPs, domains, hashes).
Context/Data Pulled:
- An agent parses the report, extracting IOCs and categorizing them (e.g.,
type:ipv4,type:domain). - For each category, the agent constructs an optimized Cortex Data Lake Query Language (XDQL) query to search across relevant log types (e.g.,
traffic,threat,url) for the past 30 days. - Queries are executed asynchronously against the Cortex Data Lake API using the
jobsendpoints to handle large result sets.
Model or Agent Action:
- A summary agent receives the raw query results (potentially thousands of matches). It uses an LLM to:
- Cluster matches by source/destination IP, user, or internal asset.
- Summarize the volume, timeframe, and log types where hits occurred.
- Draft a brief narrative assessing the potential impact (e.g., "15 internal hosts communicated with 3 of the reported C2 IPs over the last week").
System Update or Next Step:
- The summary and clustered data are posted as an enrichment note to a corresponding incident in Cortex XDR or ServiceNow.
- High-confidence matches automatically generate new local block rules in Panorama or Prisma Access via their respective APIs.
- A summary report is saved to a shared drive for the threat intel team.
Human Review Point: The proposed firewall rule changes are placed in a staging policy group, requiring analyst approval before promotion to production.
Implementation Architecture: Data Flow and Guardrails
A practical blueprint for connecting AI models to the Cortex Data Lake API to enable advanced threat hunting, bulk analysis, and custom reporting.
The core integration pattern involves a secure middleware application that sits between your AI models and the Cortex Data Lake API. This application handles authentication (via OAuth 2.0 or API keys), manages API rate limits, and orchestrates queries. A typical data flow starts with an AI agent or analyst interface formulating a natural language request (e.g., "Find all internal hosts that communicated with known C2 IPs in the last 30 days"). The middleware translates this into the appropriate XQL (XDR Query Language) syntax, executes the query against the Data Lake API, and streams the JSON results back for processing. For bulk operations like extracting IOCs across millions of records, the middleware manages pagination, result caching, and incremental data syncs to avoid hitting API limits.
Key architectural guardrails must be established for production. First, implement query cost and scope governance. Since the Data Lake contains vast telemetry, AI-generated queries must be scoped with time ranges, result limits, and filters on high-volume log types (like DNS or proxy) to prevent runaway queries that consume excessive resources. Second, all AI-generated XQL should be logged in an audit trail with the requesting user/agent, execution time, and data volume returned for compliance. Third, sensitive data handling is critical. Use the middleware as a policy enforcement point to redact or tokenize specific high-sensitivity fields (e.g., usernames, internal hostnames) from query results before they are passed to a third-party LLM, ensuring data never leaves your governance boundary in raw form.
For rollout, start with read-only, analyst-in-the-loop workflows. Deploy the integration initially for assisted threat hunting, where an AI co-pilot suggests XQL queries based on a threat report, but an analyst reviews and approves execution. This builds trust and provides a feedback loop to tune the query generation. Phase two can introduce automated, scheduled jobs for bulk indicator extraction and trend reporting, where predefined, vetted XQL templates are run by AI agents to populate internal dashboards or SIEM correlation lists. The final phase enables closed-loop detection, where the AI analyzes Data Lake query results to propose new detection rules or tweak existing ones, creating a continuous improvement cycle for your security analytics.
Code Patterns and API Payload Examples
Querying for Indicators at Scale
Use the Cortex Data Lake API to retrieve raw logs for a time window and apply an AI model to extract and classify potential indicators of compromise (IOCs). This pattern moves beyond simple regex matching to identify suspicious domains, IPs, and file hashes based on contextual patterns and threat intelligence correlation.
A typical workflow involves:
- Executing an API query for specific log types (e.g., traffic, threat).
- Sending the JSON results to an AI service for entity extraction and risk scoring.
- Enriching the extracted IOCs with external threat feeds.
- Outputting a structured report or pushing high-confidence IOCs back to your SOAR platform for blocking.
python# Example: Fetch threat logs and extract IOCs import requests import json # Query CDL for threat logs over the last 24 hours query = { "query": "SELECT * FROM threat WHERE _time > now() - 86400", "limit": 10000 } headers = { "Authorization": "Bearer YOUR_API_TOKEN", "Content-Type": "application/json" } response = requests.post( "https://api.cdl.paloaltonetworks.com/api/v2/logs/query", headers=headers, json=query ) threat_logs = response.json().get('data', []) # Send 'threat_logs' to an AI service for IOC extraction & analysis
Realistic Time Savings and Operational Impact
This table illustrates the operational impact of integrating AI with the Palo Alto Cortex Data Lake API, focusing on time savings, workflow efficiency, and analyst enablement for tasks that are cumbersome or impossible in the native UI.
| Metric | Before AI | After AI | Notes |
|---|---|---|---|
Bulk IOC extraction for threat intel feeds | Manual query building and CSV export | Automated, scheduled extraction via API | Enables daily enrichment of TIP/SIEM without analyst intervention |
Historical hunt for a new TTP across 90 days of logs | Days of iterative SPL/XQL query refinement | Hours to generate and validate hypothesis-driven queries | AI suggests high-value time ranges and data fields based on TTP description |
Custom executive report on attack campaign prevalence | Manual data aggregation and slide creation | Automated report generation with narrative summaries | Pulls from Data Lake, correlates with external intel, drafts narrative |
Data quality and schema analysis for new log source | Manual sample review and field mapping | Automated schema inference and mapping recommendations | Accelerates onboarding and ensures detection coverage |
Extracting user/entity behavior baselines over quarters | Resource-intensive queries impacting production | Optimized, phased queries with smart sampling | Reduces performance load on Data Lake, enables longitudinal analysis |
Identifying log source gaps for critical detection coverage | Periodic manual audit and spreadsheet tracking | Continuous automated analysis and alerting | Proactively maintains security monitoring efficacy |
Generating training datasets for custom ML models | Manual data labeling and feature engineering | Semi-automated dataset creation and labeling assistance | Reduces data scientist prep time from weeks to days |
Governance, Security, and Phased Rollout
A pragmatic approach to building, securing, and scaling AI applications on the Cortex Data Lake API.
Integrating AI with the Cortex Data Lake API introduces new considerations for data governance, API security, and operational control. Your architecture must enforce strict role-based access control (RBAC) for AI queries, ensuring agents or applications only access log types and time ranges permitted by their service account. Implement a gateway layer (e.g., using Kong or a custom service) to broker all API calls, enforcing rate limits, auditing all queries for compliance, and masking sensitive fields (like usernames or internal IPs) before data is sent to an LLM for analysis. This layer also manages authentication, rotating the long-lived API keys required by Cortex Data Lake and preventing direct exposure to your AI workloads.
A phased rollout is critical for managing risk and proving value. Start with a read-only, human-in-the-loop phase: build an internal tool that allows threat hunters to submit natural language questions (e.g., "show me all outbound connections to ASN 12345 in the last 7 days") which are translated to XQL, executed, and results summarized. This validates the query translation accuracy and business impact without automation. Phase two introduces scheduled, automated reporting agents that run daily or weekly to extract IOCs, summarize attack trends, or generate compliance evidence. The final phase moves to event-triggered agents, where webhooks from your SIEM (like Cortex XDR or a third-party platform) automatically trigger targeted Data Lake queries to enrich incidents with historical context.
Govern this integration like any critical data pipeline. Maintain a full audit trail of every query generated, the API call made, the data volume returned, and the consuming user or agent. Use this to monitor cost implications and detect anomalous query patterns. Establish a prompt management system to version and control the instructions that convert analyst intent into XQL, ensuring consistency and allowing for safe iteration. Finally, define a rollback protocol. If an agent generates inefficient queries that impact API performance or returns unexpected results, you must be able to instantly disable specific workflows without affecting your core security operations.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Frequently Asked Questions
Practical questions for teams building AI applications that query Palo Alto Networks Cortex Data Lake for threat hunting, bulk analysis, and custom reporting.
The Cortex Data Lake (CDL) API provides programmatic access to a massive, centralized repository of network, threat, and traffic logs. AI integration focuses on three high-value areas:
- Bulk Threat Hunting & Pattern Discovery: Query months of log data to identify subtle, multi-stage attack patterns that evade real-time detection. Use AI to generate hunting hypotheses based on emerging TTPs and translate them into efficient API queries.
- Indicator Extraction & Enrichment at Scale: Automatically extract IOCs (IPs, domains, file hashes) from large query result sets. Use AI to enrich these indicators with internal context (e.g., "this IP communicated with our finance server") and external threat intelligence summaries.
- Custom Reporting & Executive Summaries: Generate natural-language summaries of security posture, top attack vectors, or campaign activity over custom timeframes. AI can synthesize raw log counts and trends into narrative reports for leadership or compliance audits.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us