Inferensys

Integration

AI Integration for Palo Alto Cortex Data Lake API

Build AI applications that query the Cortex Data Lake API for advanced threat hunting, bulk indicator extraction, and custom reporting outside the native UI.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
BUILDING AI APPLICATIONS ON THE LOGGING BACKBONE

Where AI Fits in the Cortex Data Lake Architecture

Integrating AI with the Cortex Data Lake API unlocks advanced threat hunting, bulk analysis, and custom reporting by treating the data lake as a high-fidelity, long-term source of truth.

The Cortex Data Lake (CDL) serves as the central logging backbone for Palo Alto Networks' Strata firewalls, Prisma Access, and Cloud NGFW. An AI integration connects at the API layer, querying structured log data—including traffic, threat, URL filtering, and WildFire submission logs—without the constraints of the native UI. This allows for bulk extraction of indicators, trend analysis across months of data, and the creation of custom detection models using the lake's historical records as training data. The primary surfaces are the /loggingservice/services/logquery/v1 and /loggingservice/services/dashboard/v1 endpoints, which provide programmatic access to the normalized log schema.

A production implementation typically involves a scheduled job or event-driven service that executes XQL-like queries via the CDL API, streams the results to a processing queue, and feeds them into an AI pipeline. High-value use cases include:

  • Retrospective Threat Hunting: Querying for specific file hashes, domains, or user-agent strings across 12+ months of logs to identify past compromises.
  • Bulk IOC Enrichment: Extracting all external IPs or domains from a specified time range to enrich with threat intelligence and score for risk.
  • Custom Behavioral Baselining: Using months of traffic log data to train a model on normal network patterns for specific subnets or user groups, then flagging deviations.
  • Compliance Evidence Automation: Programmatically gathering logs to demonstrate control effectiveness for audits (e.g., "show all blocked high-risk URL attempts for the finance department in Q1").

Governance and rollout require careful planning. API calls are metered, so queries must be optimized for time ranges and field selection to manage cost. A rollout should start with read-only, non-production queries to validate data completeness and schema understanding. Since CDL contains sensitive network data, the AI service must operate under strict RBAC, and any extracted data should be processed within a secure enclave. Inference Systems architects these integrations with a focus on idempotent, auditable query jobs that log their actions back to CDL or a SIEM, ensuring the AI workflow itself is transparent and secure.

PALO ALTO CORTEX DATA LAKE

Key API Surfaces for AI Integration

Query API: The Core Hunting Surface

The Query API (/loggingservice/services/v2/loggingservice/query) is the primary interface for AI-driven threat hunting and historical analysis. It allows you to execute XQL queries against the vast log repository stored in Cortex Data Lake (CDL). This is where you build AI agents that can answer complex, multi-faceted questions about past security events.

Key AI Use Cases:

  • Bulk IOC Extraction: Automatically query for indicators across all logs (e.g., | filter action = "block") to populate threat intelligence platforms.
  • Trend Analysis & Reporting: Use AI to generate hypotheses and craft XQL queries to identify attack trends, top talkers, or policy misconfigurations over weeks or months of data.
  • Custom Detection Validation: Test the efficacy of new detection rules by querying historical data to see if they would have triggered on past incidents.

Implementation Note: AI workflows here are asynchronous. Your integration must handle job creation, polling for results, and parsing the potentially large JSONL response payloads for downstream analysis.

LONG-TERM THREAT INTELLIGENCE & ANALYSIS

High-Value AI Use Cases for Cortex Data Lake

Cortex Data Lake provides a unified repository for logs from Palo Alto Networks firewalls, Prisma Access, and Cortex XDR. These pages detail how to build AI applications that query this API directly, enabling advanced analysis, bulk data extraction, and custom reporting beyond the native UI's capabilities.

01

Bulk IOC & Threat Hunting Query Generation

Automate the creation and execution of complex XQL queries against the Data Lake API for threat hunting campaigns. An AI agent can translate natural language requests (e.g., "Find all internal hosts that communicated with known C2 domains in the last 90 days") into valid XQL, handle pagination for large result sets, and summarize findings. This moves hunting from manual, iterative query building to a guided, scalable process.

1 sprint
Campaign setup time
02

Custom Security & Compliance Reporting

Generate scheduled, bespoke reports by querying the Data Lake API for data not readily available in standard dashboards. Use AI to define report parameters, execute the necessary XQL queries to aggregate data (e.g., application usage by department, geo-blocked traffic trends, SSL/TLS cipher analysis), and format the output into PDFs, slides, or CSV files for stakeholders. This automates manual data pulls for audit and operational reviews.

Hours -> Minutes
Report generation
03

Long-Term Attack Pattern & TTP Analysis

Leverage the extended data retention in Cortex Data Lake to train custom ML models or perform retrospective analysis. An AI workflow can periodically extract months of log data to identify subtle, slow-burn attack patterns (like low-and-slow data exfiltration or periodic beaconing) that are invisible in short-term analysis. This provides a historical baseline for detecting advanced persistent threats (APTs).

04

Data Enrichment for External SIEM/SOAR

Use the Data Lake API as a high-fidelity source to enrich incidents in a primary SIEM (like Splunk or Microsoft Sentinel) or SOAR platform. An AI agent can be triggered by an alert in the external system, query the Data Lake for relevant raw logs and session details to provide deeper context (e.g., full application identification, user mapping, threat content), and attach this enriched data to the incident record for faster triage.

Batch -> Real-time
Context retrieval
05

Anomaly Detection on Network Meta-Features

Go beyond signature-based alerts by analyzing aggregated log metadata. An AI model can consume daily summaries from the Data Lake API—such as unique destination counts per source, bytes transferred per application, or session duration variances—to establish behavioral baselines for networks, users, and applications. It then flags significant deviations that may indicate compromised accounts, insider threats, or policy violations.

06

Automated Policy Optimization & Clean-up

Analyze firewall rule hit counts and application usage data from the Data Lake to recommend security policy improvements. An AI system can identify rarely-used rules (potential for cleanup), rules consistently blocking legitimate business traffic (needing adjustment), and shadow IT applications communicating on non-standard ports. This provides data-driven insights for network and security architects to harden the environment.

Same day
Insight delivery
CORTEX DATA LAKE API INTEGRATIONS

Example AI-Driven Workflows

These workflows demonstrate how to connect AI models and agents to the Palo Alto Cortex Data Lake API to automate threat hunting, enrich investigations, and generate custom intelligence outside the native UI. Each flow is triggered by a specific operational need and leverages the API's bulk query capabilities.

Trigger: A new threat intelligence report is published containing 500+ IOCs (IPs, domains, hashes).

Context/Data Pulled:

  1. An agent parses the report, extracting IOCs and categorizing them (e.g., type:ipv4, type:domain).
  2. For each category, the agent constructs an optimized Cortex Data Lake Query Language (XDQL) query to search across relevant log types (e.g., traffic, threat, url) for the past 30 days.
  3. Queries are executed asynchronously against the Cortex Data Lake API using the jobs endpoints to handle large result sets.

Model or Agent Action:

  • A summary agent receives the raw query results (potentially thousands of matches). It uses an LLM to:
    • Cluster matches by source/destination IP, user, or internal asset.
    • Summarize the volume, timeframe, and log types where hits occurred.
    • Draft a brief narrative assessing the potential impact (e.g., "15 internal hosts communicated with 3 of the reported C2 IPs over the last week").

System Update or Next Step:

  • The summary and clustered data are posted as an enrichment note to a corresponding incident in Cortex XDR or ServiceNow.
  • High-confidence matches automatically generate new local block rules in Panorama or Prisma Access via their respective APIs.
  • A summary report is saved to a shared drive for the threat intel team.

Human Review Point: The proposed firewall rule changes are placed in a staging policy group, requiring analyst approval before promotion to production.

BUILDING AI APPLICATIONS ON THE DATA LAKE

Implementation Architecture: Data Flow and Guardrails

A practical blueprint for connecting AI models to the Cortex Data Lake API to enable advanced threat hunting, bulk analysis, and custom reporting.

The core integration pattern involves a secure middleware application that sits between your AI models and the Cortex Data Lake API. This application handles authentication (via OAuth 2.0 or API keys), manages API rate limits, and orchestrates queries. A typical data flow starts with an AI agent or analyst interface formulating a natural language request (e.g., "Find all internal hosts that communicated with known C2 IPs in the last 30 days"). The middleware translates this into the appropriate XQL (XDR Query Language) syntax, executes the query against the Data Lake API, and streams the JSON results back for processing. For bulk operations like extracting IOCs across millions of records, the middleware manages pagination, result caching, and incremental data syncs to avoid hitting API limits.

Key architectural guardrails must be established for production. First, implement query cost and scope governance. Since the Data Lake contains vast telemetry, AI-generated queries must be scoped with time ranges, result limits, and filters on high-volume log types (like DNS or proxy) to prevent runaway queries that consume excessive resources. Second, all AI-generated XQL should be logged in an audit trail with the requesting user/agent, execution time, and data volume returned for compliance. Third, sensitive data handling is critical. Use the middleware as a policy enforcement point to redact or tokenize specific high-sensitivity fields (e.g., usernames, internal hostnames) from query results before they are passed to a third-party LLM, ensuring data never leaves your governance boundary in raw form.

For rollout, start with read-only, analyst-in-the-loop workflows. Deploy the integration initially for assisted threat hunting, where an AI co-pilot suggests XQL queries based on a threat report, but an analyst reviews and approves execution. This builds trust and provides a feedback loop to tune the query generation. Phase two can introduce automated, scheduled jobs for bulk indicator extraction and trend reporting, where predefined, vetted XQL templates are run by AI agents to populate internal dashboards or SIEM correlation lists. The final phase enables closed-loop detection, where the AI analyzes Data Lake query results to propose new detection rules or tweak existing ones, creating a continuous improvement cycle for your security analytics.

AI-ENHANCED THREAT HUNTING AND REPORTING

Code Patterns and API Payload Examples

Querying for Indicators at Scale

Use the Cortex Data Lake API to retrieve raw logs for a time window and apply an AI model to extract and classify potential indicators of compromise (IOCs). This pattern moves beyond simple regex matching to identify suspicious domains, IPs, and file hashes based on contextual patterns and threat intelligence correlation.

A typical workflow involves:

  1. Executing an API query for specific log types (e.g., traffic, threat).
  2. Sending the JSON results to an AI service for entity extraction and risk scoring.
  3. Enriching the extracted IOCs with external threat feeds.
  4. Outputting a structured report or pushing high-confidence IOCs back to your SOAR platform for blocking.
python
# Example: Fetch threat logs and extract IOCs
import requests
import json

# Query CDL for threat logs over the last 24 hours
query = {
    "query": "SELECT * FROM threat WHERE _time > now() - 86400",
    "limit": 10000
}

headers = {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.cdl.paloaltonetworks.com/api/v2/logs/query",
    headers=headers,
    json=query
)

threat_logs = response.json().get('data', [])
# Send 'threat_logs' to an AI service for IOC extraction & analysis
AI-ENHANCED THREAT HUNTING AND DATA OPERATIONS

Realistic Time Savings and Operational Impact

This table illustrates the operational impact of integrating AI with the Palo Alto Cortex Data Lake API, focusing on time savings, workflow efficiency, and analyst enablement for tasks that are cumbersome or impossible in the native UI.

MetricBefore AIAfter AINotes

Bulk IOC extraction for threat intel feeds

Manual query building and CSV export

Automated, scheduled extraction via API

Enables daily enrichment of TIP/SIEM without analyst intervention

Historical hunt for a new TTP across 90 days of logs

Days of iterative SPL/XQL query refinement

Hours to generate and validate hypothesis-driven queries

AI suggests high-value time ranges and data fields based on TTP description

Custom executive report on attack campaign prevalence

Manual data aggregation and slide creation

Automated report generation with narrative summaries

Pulls from Data Lake, correlates with external intel, drafts narrative

Data quality and schema analysis for new log source

Manual sample review and field mapping

Automated schema inference and mapping recommendations

Accelerates onboarding and ensures detection coverage

Extracting user/entity behavior baselines over quarters

Resource-intensive queries impacting production

Optimized, phased queries with smart sampling

Reduces performance load on Data Lake, enables longitudinal analysis

Identifying log source gaps for critical detection coverage

Periodic manual audit and spreadsheet tracking

Continuous automated analysis and alerting

Proactively maintains security monitoring efficacy

Generating training datasets for custom ML models

Manual data labeling and feature engineering

Semi-automated dataset creation and labeling assistance

Reduces data scientist prep time from weeks to days

ARCHITECTING FOR PRODUCTION

Governance, Security, and Phased Rollout

A pragmatic approach to building, securing, and scaling AI applications on the Cortex Data Lake API.

Integrating AI with the Cortex Data Lake API introduces new considerations for data governance, API security, and operational control. Your architecture must enforce strict role-based access control (RBAC) for AI queries, ensuring agents or applications only access log types and time ranges permitted by their service account. Implement a gateway layer (e.g., using Kong or a custom service) to broker all API calls, enforcing rate limits, auditing all queries for compliance, and masking sensitive fields (like usernames or internal IPs) before data is sent to an LLM for analysis. This layer also manages authentication, rotating the long-lived API keys required by Cortex Data Lake and preventing direct exposure to your AI workloads.

A phased rollout is critical for managing risk and proving value. Start with a read-only, human-in-the-loop phase: build an internal tool that allows threat hunters to submit natural language questions (e.g., "show me all outbound connections to ASN 12345 in the last 7 days") which are translated to XQL, executed, and results summarized. This validates the query translation accuracy and business impact without automation. Phase two introduces scheduled, automated reporting agents that run daily or weekly to extract IOCs, summarize attack trends, or generate compliance evidence. The final phase moves to event-triggered agents, where webhooks from your SIEM (like Cortex XDR or a third-party platform) automatically trigger targeted Data Lake queries to enrich incidents with historical context.

Govern this integration like any critical data pipeline. Maintain a full audit trail of every query generated, the API call made, the data volume returned, and the consuming user or agent. Use this to monitor cost implications and detect anomalous query patterns. Establish a prompt management system to version and control the instructions that convert analyst intent into XQL, ensuring consistency and allowing for safe iteration. Finally, define a rollback protocol. If an agent generates inefficient queries that impact API performance or returns unexpected results, you must be able to instantly disable specific workflows without affecting your core security operations.

CORTEX DATA LAKE API INTEGRATION

Frequently Asked Questions

Practical questions for teams building AI applications that query Palo Alto Networks Cortex Data Lake for threat hunting, bulk analysis, and custom reporting.

The Cortex Data Lake (CDL) API provides programmatic access to a massive, centralized repository of network, threat, and traffic logs. AI integration focuses on three high-value areas:

  • Bulk Threat Hunting & Pattern Discovery: Query months of log data to identify subtle, multi-stage attack patterns that evade real-time detection. Use AI to generate hunting hypotheses based on emerging TTPs and translate them into efficient API queries.
  • Indicator Extraction & Enrichment at Scale: Automatically extract IOCs (IPs, domains, file hashes) from large query result sets. Use AI to enrich these indicators with internal context (e.g., "this IP communicated with our finance server") and external threat intelligence summaries.
  • Custom Reporting & Executive Summaries: Generate natural-language summaries of security posture, top attack vectors, or campaign activity over custom timeframes. AI can synthesize raw log counts and trends into narrative reports for leadership or compliance audits.
Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.