AI integration connects to OpenShift Virtualization's core APIs and controllers—primarily the KubeVirt custom resource definitions (CRDs) like VirtualMachine, VirtualMachineInstance, and DataVolume—to observe and act on the VM lifecycle. Key surfaces for automation include the Virtual Machine Manager (virtctl), the Containerized Data Importer (CDI) for disk operations, and the Node Placement and Live Migration controllers. By tapping into these APIs, AI agents can monitor VM health, analyze performance metrics from Prometheus, and execute lifecycle commands, creating a feedback loop for intelligent management.
Integration
AI Integration with OpenShift Virtualization

Where AI Fits in OpenShift Virtualization
Integrating AI with OpenShift Virtualization automates VM operations, optimizes resource consumption, and enhances security for hybrid container/VM workloads.
High-value use cases focus on operational efficiency and cost control: Intelligent Migration Triggers analyze node pressure (CPU, memory, network) to preemptively live-migrate VMs before performance degrades. Resource Right-Sizing reviews historical VirtualMachineInstance metrics to suggest optimal limits and requests, shrinking over-provisioned VMs or scaling up constrained ones. Compliance Scanning inspects VM DataVolume configurations and attached ConfigMaps against security baselines (e.g., CIS benchmarks) to flag non-compliant root disks or missing SELinux contexts. Snapshot and Backup Orchestration uses AI to analyze application consistency groups and I/O patterns, scheduling VirtualMachineSnapshot operations during low-activity windows.
A production implementation wires an AI agent as a Kubernetes Operator within the OpenShift cluster, watching KubeVirt resources via informers. The agent uses a vector database (like Weaviate) to store time-series performance embeddings and VM configuration states, enabling semantic search for similar incidents or optimization patterns. Governance is critical: all AI-driven actions (e.g., a migration or resize) should generate a Change Request in the VirtualMachine's annotations, requiring approval via OpenShift's RBAC or an integrated ITSM webhook for audit trails. Rollout starts in monitoring-only mode, building trust by correlating AI recommendations with SRE actions before enabling automated remediation for low-risk workflows.
Key Integration Surfaces in OpenShift Virtualization
Intelligent VM Operations
AI agents can integrate with the VirtualMachine and VirtualMachineInstance Custom Resource Definitions (CRDs) to automate and optimize core lifecycle events. Key surfaces include:
- Live Migration Triggers: Analyze node metrics (CPU pressure, memory, network) from the OpenShift Monitoring stack to predict performance degradation and initiate
VirtualMachineMigrationresources before SLA impact. - Right-Sizing Recommendations: Monitor historical
VirtualMachineInstanceresource usage (via themetrics.k8s.ioAPI) to generate actionable reports suggesting CPU/memory adjustments, reducing waste in hybrid VM/container environments. - Provisioning Workflows: Use the KubeVirt API to generate VM definitions from natural language requests or existing templates, automating the creation of
DataVolumesandVirtualMachinespecs for development or test environments.
This moves VM management from reactive to predictive, aligning resource consumption with actual workload needs.
High-Value AI Use Cases for VM Management
Integrate AI agents with OpenShift Virtualization to automate VM lifecycle decisions, optimize resource consumption, and enforce compliance across hybrid container/VM estates. These workflows target platform engineering, SRE, and infrastructure operations teams managing stateful workloads.
Intelligent VM Right-Sizing Recommendations
AI agents analyze historical CPU, memory, and I/O utilization from OpenShift Virtualization's metrics and Prometheus to recommend VM resource adjustments. Workflow: Agent reviews VirtualMachine specs and usage trends, generates a change request with justification, and can trigger an automated resize via the KubeVirt API after approval. Operational value: Prevents over-provisioning waste and performance bottlenecks without manual capacity analysis.
Automated Migration Trigger for Node Maintenance
Use AI to predict node health issues or schedule maintenance by analyzing node conditions, hardware alerts, and cluster events. Workflow: Agent evaluates VirtualMachineInstance affinity rules and live migration feasibility, then orchestrates a sequenced migration plan using the VirtualMachine evictionStrategy. Integrates with OpenShift's Machine Health Check and update services. Operational value: Enables proactive, zero-downtime maintenance and reduces unplanned VM outages.
Compliance Scanning for VM Configuration Drift
Deploy AI agents that continuously audit VirtualMachine and VirtualMachineInstance specs against internal security baselines and CIS benchmarks for KubeVirt. Workflow: Agent fetches VM definitions, compares settings (e.g., secure boot, disk interfaces), flags deviations, and generates remediation tickets in ServiceNow or Jira. Operational value: Automates evidence collection for audits and ensures VM hardening policies are enforced across development and production.
Predictive Storage Capacity Planning
AI analyzes usage patterns of DataVolume and PersistentVolumeClaim objects to forecast storage consumption for VM disks. Workflow: Agent correlates VM creation/deletion trends with storage class performance metrics, predicts shortages, and recommends provisioning adjustments or cleanup of orphaned volumes. Operational value: Prevents application downtime due to full storage backends and optimizes capital expenditure on block storage.
Golden Image Update and Lifecycle Automation
Manage VM template (DataVolume source) lifecycle using AI to assess patch criticality, test compatibility, and rollout updated images. Workflow: Agent monitors CVE feeds and Red Hat advisories, clones and patches a base image, runs smoke tests in an isolated namespace, and updates the VirtualMachine dataVolumeTemplates for new provisioning. Operational value: Dramatically reduces the time to deploy secure, patched VM images across the platform, minimizing vulnerability exposure.
Cost Attribution and Showback for VM Workloads
AI agents enrich OpenShift Virtualization metrics with cloud provider pricing or internal rate cards to allocate infrastructure costs to tenant namespaces and projects. Workflow: Agent aggregates resource consumption per VirtualMachine, applies cost models, generates itemized reports, and feeds data into FinOps platforms like CloudHealth or Vantage. Operational value: Enables accurate chargeback for VM-based services and identifies candidates for migration to cost-optimized container workloads.
Example AI-Driven Workflows
These workflows demonstrate how AI agents can automate and enhance VM lifecycle management within OpenShift Virtualization, moving from reactive operations to predictive, policy-driven orchestration for hybrid VM/container workloads.
Trigger: A scheduled agent job analyzes VM performance metrics from OpenShift Monitoring (Prometheus) and the VirtualMachineInstance (VMI) custom resource.
Context Pulled: The agent retrieves historical CPU/memory utilization (95th percentile), storage IOPs, network throughput, and the current node's resource availability and taints.
Agent Action: An LLM-based analyzer evaluates the data against predefined cost-performance policies (e.g., "Optimize for cost in dev, performance in prod"). It generates a recommendation: resize the VM's requested resources, migrate it to a different node, or take no action.
System Update: For resizing, the agent generates and applies a patch to the VirtualMachine spec. For migration, it creates a VirtualMachineInstanceMigration manifest, targeting a node selected by its scheduler logic.
Human Review Point: Recommendations flagged as "high-impact" (e.g., migrating a stateful VM with high I/O) are sent to a Slack channel or ServiceNow ticket for platform engineer approval before execution.
Implementation Architecture & Data Flow
Integrating AI with OpenShift Virtualization requires a data pipeline that spans VM metadata, performance telemetry, and containerized AI agents to automate lifecycle decisions.
The integration architecture centers on the OpenShift Virtualization API and Prometheus metrics as primary data sources. AI agents, deployed as containers within the same OpenShift cluster, subscribe to events like VirtualMachineInstance state changes, VirtualMachine migrations, and resource utilization alerts. Key data objects include VM spec (CPU, memory, storage), status (phase, conditions), and historical performance metrics from the kubevirt namespace in Prometheus. This data is streamed to a vector store for real-time analysis and historical pattern matching, enabling agents to make context-aware recommendations.
A typical high-value workflow involves intelligent migration triggering. An AI agent monitors VM performance against node capacity and predefined policies (e.g., CPU wait time > threshold). When a trigger condition is met, the agent evaluates target Node resources, checks for NodeSelector or Taint conflicts, and uses the VirtualMachine evictionStrategy API to initiate a live migration. The agent can also pre-warm storage on the target node and generate a migration runbook for operator review, reducing manual intervention from hours to minutes. For compliance scanning, agents periodically snapshot VM configurations, compare them against CIS benchmarks for the guest OS, and flag drifts in VirtualMachineInstance security settings or installed packages.
Rollout requires careful governance. AI recommendations should be routed through an approval webhook or integrated with OpenShift's Gatekeeper for policy enforcement before any live migration or spec modification. All agent decisions must be logged to the cluster's audit trail and correlated with OpenShift Logging. Start with a read-only observation phase where agents analyze and report only, then progress to automated actions for non-critical dev/test workloads. This phased approach mitigates risk while demonstrating value through reduced manual oversight and optimized resource utilization across hybrid VM/container workloads.
Code & Payload Examples
Automating VM Provisioning & Migration
Integrate AI agents with the OpenShift Virtualization API to analyze workload patterns and trigger intelligent VM lifecycle events. For instance, an agent can monitor VM performance metrics and historical trends to recommend right-sizing or initiate live migrations to optimize cluster balance.
Example Python API Call:
pythonimport openshift_client from openshift.dynamic import DynamicClient # Initialize client for OpenShift Virtualization k8s_client = DynamicClient(openshift_client.config.new_client_from_config()) vm_resources = k8s_client.resources.get(api_version='kubevirt.io/v1', kind='VirtualMachine') # Fetch VM metrics and analyze for migration trigger vm = vm_resources.get(name='app-vm-1', namespace='production') metrics = get_vm_metrics(vm.metadata.name) # Custom function to fetch Prometheus metrics if should_migrate(metrics): # AI decision logic # Trigger migration via VirtualMachineInstanceMigration migration_manifest = { 'apiVersion': 'kubevirt.io/v1', 'kind': 'VirtualMachineInstanceMigration', 'metadata': {'name': f'migrate-{vm.metadata.name}'}, 'spec': {'vmiName': vm.metadata.name} } mig_resources = k8s_client.resources.get(api_version='kubevirt.io/v1', kind='VirtualMachineInstanceMigration') mig_resources.create(body=migration_manifest, namespace=vm.metadata.namespace)
This pattern automates proactive workload balancing, reducing manual intervention for SRE teams managing hybrid VM/container fleets.
Realistic Time Savings & Operational Impact
How AI agents integrated with OpenShift Virtualization APIs transform manual VM operations into proactive, intelligent workflows.
| Operational Metric | Before AI Integration | After AI Integration | Implementation Notes |
|---|---|---|---|
VM right-sizing analysis | Manual review of utilization dashboards, 2-4 hours per VM | Automated weekly report with prioritized recommendations | AI analyzes historical CPU/memory/disk trends against workload patterns |
Migration trigger identification | Reactive, based on performance alerts or hardware failures | Proactive forecast of migration candidates 7 days in advance | AI correlates node health metrics, workload criticality, and maintenance windows |
Compliance drift detection | Quarterly manual audit against CIS benchmarks for VMs | Continuous scanning with daily summary of non-compliant VMs | Integrates with OpenShift Compliance Operator and policy-as-code |
Snapshot lifecycle management | Ad-hoc snapshots, manual cleanup of stale backups | Automated retention policy enforcement & cost-optimized storage tiering | AI analyzes snapshot age, VM change rate, and storage costs |
VM provisioning approval workflows | Manual ticket routing and review, 1-2 business day delay | AI-assisted pre-flight checks and automated routing, same-day approval | Validates against quota, security policy, and naming conventions |
Hybrid workload placement | Static rules for VM vs. container placement | Dynamic recommendation engine based on real-time cluster capacity | Considers GPU needs, network latency, and data locality for AI/ML workloads |
Incident root cause analysis | Manual log correlation across virtualization and container layers | Automated correlation of VM events with underlying cluster issues | AI links VM failures to node problems, storage outages, or network events |
Governance, Security & Phased Rollout
Integrating AI into OpenShift Virtualization requires a deliberate approach to security, compliance, and operational change management.
AI agents interact with OpenShift Virtualization through its Kubernetes-native APIs (VirtualMachine, VirtualMachineInstance, DataVolume CRDs) and the kubevirt subsystem. A production integration must enforce strict RBAC and namespace-scoped service accounts to ensure AI workflows only access designated VM resources. All AI-driven actions—like triggering a live migration or recommending a CPU adjustment—should generate audit trails in the cluster's event log and be reconcilable with OpenShift's built-in monitoring and compliance operators.
A phased rollout mitigates risk and builds operator trust. Start with read-only analysis agents that monitor VirtualMachineInstance metrics and VirtualMachine specs to provide recommendations (e.g., 'VM prod-db-01 shows sustained low CPU, consider rightsizing'). Phase two introduces approval-gated actions, where an AI agent can propose a migration via a webhook to a ticketing system like ServiceNow or Jira, requiring a human approval before the VirtualMachine spec is patched. The final phase enables closed-loop automation for non-critical, well-understood workflows, such as automatically applying predefined tags to VMs based on AI-classified workload patterns.
Governance is critical for hybrid VM/container workloads. AI models making decisions must be versioned and traceable, with prompts and inference logs stored in a secure, immutable layer. Use OpenShift's Pod Security Standards and network policies to isolate the AI agent pods, and consider deploying a dedicated GPU node pool with kubevirt device passthrough if the AI models require it for analyzing performance telemetry. Regularly validate that AI-driven changes align with cluster-level constraints and corporate security policies enforced by OpenShift's compliance operator.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Frequently Asked Questions
Practical questions for platform architects and virtualization admins planning to embed AI agents into OpenShift Virtualization workflows for intelligent VM lifecycle management.
AI agents interact primarily with the KubeVirt Custom Resource Definitions (CRDs) and the OpenShift Virtualization Operator APIs. Key integration points include:
- VirtualMachine (VM) and VirtualMachineInstance (VMI) Objects: Agents monitor these resources for status changes, performance metrics (
spec.domain.resources), and migration events. - Migration Objects: The
VirtualMachineInstanceMigrationCRD allows agents to trigger, monitor, and analyze live migration operations. - DataVolumes and PersistentVolumeClaims: AI can analyze storage usage patterns and performance tied to VMs.
- Metrics APIs: Integration with the OpenShift Monitoring stack (Prometheus) provides CPU, memory, network I/O, and storage latency metrics for VMs and underlying nodes.
A typical agent uses a service account with RBAC permissions to watch and patch these resources, reacting to events or performing scheduled analysis to recommend actions like right-sizing or migration.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us