Inferensys

Integration

AI Integration for OpenShift AI

Augment Red Hat OpenShift AI's native capabilities with custom AI agents for model monitoring, pipeline optimization, and data scientist support workflows. Practical integration guide for MLOps teams.
Data scientist building training data pipeline on laptop, data preprocessing visible, technical workspace.
ARCHITECTURE AND ROLLOUT

Where AI Fits into the OpenShift AI Stack

A practical guide to augmenting Red Hat OpenShift AI's native MLOps tooling with custom AI agents for production-grade model operations and data scientist support.

OpenShift AI provides a curated stack of open-source tools—like Jupyter, Kubeflow, and KServe—for building, training, and serving models. AI integration focuses on augmenting these core surfaces with intelligent automation. Key integration points include the Model Serving layer (monitoring inference logs for drift and performance), the Pipeline Orchestration engine (optimizing Tekton or Argo workflows for resource efficiency), and the JupyterHub interface (providing in-notebook code and data assistance). AI agents can plug into these layers via Kubernetes Operators, webhooks, and the platform's REST APIs to act as a co-pilot for the entire AI lifecycle.

For implementation, an AI agent typically runs as a separate service or Job within the OpenShift AI project, with RBAC scoped to specific namespaces. It consumes events from the OpenShift AI dashboard API, Prometheus metrics for model endpoints, and pipeline run logs. Use cases are operational: an agent can analyze failed pipeline steps, suggest code fixes or parameter adjustments, and automatically retry with new resource requests. For model serving, it can trigger canary deployments or scale replicas based on predicted latency, moving beyond simple threshold-based alerts. This turns the platform from a static orchestration tool into a self-optimizing system.

Rollout requires careful governance, as these agents interact with production models and data. Start by deploying an AI agent in audit-only mode alongside a non-critical pipeline or development model serving endpoint. Use OpenShift's built-in audit trails and Red Hat Quay image scanning to ensure the agent's own containers are secure. Gradually introduce approval workflows, where the agent's suggestions—like a pipeline parameter change or a model rollback—require a human sign-off via a Slack webhook or ServiceNow ticket before execution. This controlled integration maximizes platform utility while maintaining the security and compliance posture expected in enterprise Kubernetes environments.

WHERE TO ADD CUSTOM AI AGENTS AND WORKFLOWS

Key Integration Surfaces in OpenShift AI

Model Serving & Inference Layer

Integrate AI agents directly with OpenShift AI's KServe and ModelMesh serving runtimes to automate operational workflows. This surface is ideal for agents that monitor model performance, manage A/B testing, and handle canary rollouts.

Key integration points:

  • KServe InferenceService CRDs: Agents can watch for changes to InferenceService resources to trigger scaling, logging, or alerting actions.
  • ModelMesh Serving: Use the REST/gRPC endpoints of deployed models for programmatic inference, enabling agents to act as orchestrators or quality gates.
  • Serving Runtime Metrics: Pull Prometheus metrics from the odh-model-controller and odh-modelmesh namespaces to detect latency spikes, error rates, or drift.

Example Agent Workflow: An AI agent analyzes inference latency percentiles. If P99 exceeds a threshold, it automatically scales the InferenceService replica count and posts an alert to a Slack channel for the data science team.

AUGMENTING NATIVE CAPABILITIES

High-Value AI Use Cases for OpenShift AI

Extend Red Hat OpenShift AI's managed MLOps platform with custom AI agents and copilots to automate operational workflows, enhance data scientist productivity, and optimize the full model lifecycle from development to production.

01

Model Performance & Drift Monitoring Agent

Deploy an AI agent that continuously analyzes model serving metrics, inference logs, and data drift from the OpenShift AI Model Serving layer. The agent correlates performance dips with cluster events (node pressure, network latency) and suggests retraining triggers or scaling adjustments, moving monitoring from periodic review to proactive alerting.

Batch -> Real-time
Drift detection
02

Jupyter Notebook Data Science Copilot

Embed a context-aware assistant within OpenShift AI's JupyterHub environment. It analyzes notebook code, suggests relevant datasets from connected data sources, explains pipeline errors, and generates documentation snippets. This reduces time spent on environment setup and debugging, letting data scientists focus on experimentation.

Hours -> Minutes
Onboarding & debugging
03

Pipeline Optimization & Failure Triage

Integrate an AI agent with OpenShift AI Pipelines (Kubeflow/Tekton) to analyze execution logs and resource metrics. The agent identifies recurring failure patterns (e.g., OOM errors in specific steps), suggests parameter tuning or resource limit increases, and can auto-generate RCA summaries for failed runs, accelerating CI/CD for ML.

1 sprint
Pipeline maturity gain
04

Intelligent Resource Scheduling & GPU Management

Augment OpenShift AI's scheduler with an AI layer that analyzes job queues, GPU utilization metrics, and user priorities. It predicts resource needs for pending experiments, suggests optimal nvidia.com/gpu allocations, and can preempt low-priority workloads to maximize hardware ROI for high-value training jobs.

Same day
GPU throughput increase
05

Compliance & Audit Workflow Automation

Automate governance tasks by connecting AI to the OpenShift AI dashboard and underlying Kubernetes APIs. An agent reviews model registry entries for missing metadata, checks pipeline runs for PII data handling, and generates audit-ready reports on model lineage and data usage for regulated industries.

Hours -> Minutes
Audit report generation
06

Self-Service Onboarding & Environment Provisioning

Implement a natural-language interface for data scientists to request and configure OpenShift AI projects. An AI agent interprets requests (e.g., "set up a PyTorch env with 2 GPUs and access to S3 bucket X"), validates quotas, generates the necessary ResourceClaim and Secret manifests, and guides users through the approval workflow.

Days -> Hours
New user setup
PRODUCTION PATTERNS

Example AI Agent Workflows for OpenShift AI

These workflows illustrate how custom AI agents can augment Red Hat OpenShift AI's native MLOps tooling, automating operational tasks and providing intelligent support to data science teams. Each pattern connects to specific APIs, data sources, and user surfaces within the platform.

Trigger: Scheduled cron job (e.g., daily) or event from the OpenShift AI model monitoring dashboard.

Context Pulled:

  • Model inference logs and performance metrics from the OpenShift AI model server (Seldon or KServe).
  • Historical baseline metrics stored in a platform-integrated data store (e.g., Prometheus, S3 via Data Science Pipelines).
  • Model metadata (version, training data signature) from the OpenShift AI model registry.

Agent Action:

  1. The agent retrieves current inference metrics (latency, throughput, error rate, custom business metrics) and compares them against the defined baseline using statistical tests (PSI, CSI).
  2. It analyzes the feature distribution of recent inference requests versus the training dataset.
  3. Using an LLM, the agent generates a plain-English summary of the drift: type (data, concept), severity, and likely impacted segments.

System Update:

  • The agent creates a Jira ticket via webhook or updates an existing OpenShift AI dashboard alert with the analysis summary.
  • It can optionally trigger a retraining pipeline in OpenShift AI Pipelines (Tekton) by submitting a new PipelineRun with parameters derived from the analysis.
  • Sends a formatted Slack/MS Teams notification to the responsible data science team, including links to the relevant dashboard and pipeline run.

Human Review Point: The notification and proposed retraining pipeline require team approval before execution, managed through the OpenShift AI UI or linked ITSM ticket.

AUGMENTING THE NATIVE MLOps STACK

Typical Implementation Architecture

A production AI integration for OpenShift AI extends the platform's native MLOps tooling with custom agents, orchestration, and governance layers.

The integration typically layers AI agents and workflows onto three key surfaces of the OpenShift AI stack: the JupyterHub environment for data scientist support, the model serving layer (KServe, Seldon) for runtime monitoring and optimization, and the pipeline orchestration engine (Kubeflow Pipelines, Tekton) for intelligent workflow automation. Agents connect via the platform's REST APIs and Kubernetes Custom Resource Definitions (CRDs), ingesting logs, metrics, and metadata from these components to provide contextual assistance, predictive analysis, and automated actions.

A common pattern involves deploying a central orchestrator agent as a service within the OpenShift AI project namespace. This agent subscribes to events (e.g., pipeline failures, model drift alerts, resource quota breaches) via webhooks and Kubernetes Event Exposers. It then routes tasks to specialized worker agents—such as a notebook copilot that helps debug code in JupyterLab, a serving optimizer that suggests canary weights or instance scaling, or a data curator that validates pipeline input datasets. These agents act on the user's behalf, executing approved actions through the OpenShift AI API or generating pull requests for GitOps-managed configurations.

Governance and rollout are managed through OpenShift's native RBAC and Projects. AI agents are granted service accounts with scoped permissions, and all agent decisions and actions are logged to the cluster's audit trails and optionally to a dedicated vector store for explainability. The rollout is often phased, starting with read-only monitoring and alert summarization for platform admins, then progressing to assisted actions within data science projects, and finally enabling automated remediation for predefined, low-risk operational scenarios like pipeline retries or log cleanup.

OPENSHIFT AI INTEGRATION SURFACES

Code and Payload Examples

Analyzing Model Serving Endpoints

Integrate AI agents with OpenShift AI's model serving layer (KServe, Seldon) to monitor performance, detect drift, and trigger retraining. Agents can poll serving metrics, analyze prediction logs, and compare live data against training distributions.

Example Python agent checking a KServe endpoint:

python
import requests
import pandas as pd
from inference_systems.agent import OpenShiftAIAgent

agent = OpenShiftAIAgent()

# Get model serving endpoint details from OpenShift AI
endpoint = agent.get_serving_endpoint(
    project_name="fraud-detection",
    model_name="v2"
)

# Fetch recent inference logs via OpenShift AI's monitoring API
logs = agent.query_inference_logs(
    endpoint_url=endpoint["metrics_url"],
    timeframe="24h",
    limit=1000
)

# Calculate prediction drift using your chosen statistical test
drift_score = calculate_psi(
    reference_distribution=training_stats,
    current_distribution=logs["predictions"]
)

if drift_score > 0.25:
    # Trigger automated retraining workflow
    agent.create_retraining_job(
        model_id=endpoint["model_id"],
        trigger="drift_detected",
        priority="high"
    )
AI-ENHANCED OPENSHIFT AI OPERATIONS

Realistic Operational Impact and Time Savings

This table shows the measurable impact of integrating custom AI agents and copilots into Red Hat OpenShift AI workflows, focusing on reducing manual toil for data scientists, MLOps engineers, and platform teams.

MetricBefore AIAfter AINotes

Pipeline failure root cause analysis

Manual log review (30-90 mins)

Automated correlation & suggestion (2-5 mins)

AI analyzes Tekton logs, model dependencies, and resource events to pinpoint likely cause

Model performance drift detection review

Weekly manual dashboard checks

Proactive anomaly alerts with context

AI monitors Seldon/Prometheus metrics, correlates with data pipeline changes

Jupyter environment provisioning

Ticket-based, 1-2 business days

Self-service via AI agent, <15 mins

AI validates resource requests against quotas and team policies before automated provisioning

Model serving resource right-sizing

Static allocation, periodic manual review

Continuous recommendation engine

AI analyzes inference latency, concurrency, and cost to suggest CPU/GPU adjustments

Experiment tracking and artifact lineage

Manual notebook logging, prone to gaps

Automated metadata capture and query

AI agent parses notebook cells and pipeline runs to populate MLflow/Weights & Biases

Data scientist support (common queries)

Search documentation, ask peers (20+ mins)

Contextual AI copilot in JupyterLab

Agent grounded in internal docs, cluster policies, and past resolved issues

Compliance scan for model registry images

Scheduled weekly scans, manual triage

Pre-promotion scan integrated in CI/CD

AI evaluates CVEs against runtime context to prioritize critical fixes

Multi-model endpoint canary analysis

Manual A/B test setup and metric review

Automated canary rollout with guardrails

AI monitors business and performance metrics, suggests rollback or full promotion

ENTERPRISE-GRADE AI OPERATIONS

Governance, Security, and Phased Rollout

A practical framework for deploying AI agents on OpenShift AI with security, compliance, and controlled adoption in mind.

Integrating AI agents into OpenShift AI requires a security-first approach that aligns with the platform's native governance model. This means embedding AI workflows within OpenShift's existing Role-Based Access Control (RBAC), Project and Namespace isolation, and audit logging framework. For instance, an AI agent that analyzes Jupyter notebook logs for cost optimization should only have service account permissions scoped to the specific project, with all its API calls and tool usage logged to the cluster's audit trail. Sensitive operations, like modifying a model server's resource limits, should be gated behind OpenShift's approval workflows or integrated with external policy engines like OPA Gatekeeper to ensure compliance before execution.

A phased rollout is critical for managing risk and proving value. Start with a read-only observation phase, where AI agents monitor model serving metrics, pipeline runtimes, and resource utilization within the OpenShift AI dashboard and Prometheus metrics, generating reports and alerts without taking action. The next phase introduces assistive automation in low-risk areas, such as suggesting optimized ResourceQuotas for data science projects or drafting Git commit messages for pipeline code changes in the connected Git repository. The final phase enables closed-loop automation for pre-approved workflows, like auto-scaling model inference deployments based on predicted traffic or triggering re-training pipelines when data drift is detected in the model registry.

Governance extends to the AI models and prompts themselves. Treat prompts as configuration managed in GitOps repositories alongside your application code, with changes peer-reviewed and deployed through OpenShift's CI/CD pipelines. Use OpenShift AI's native experiment tracking and model lineage features to maintain an audit trail of which model version an agent consulted for a recommendation. For production-critical agents, implement a human-in-the-loop step using OpenShift's notification system to Slack or Teams, requiring manual approval before executing significant changes to cluster resources or model deployments. This layered approach ensures AI augments your platform team's capabilities without introducing unmanaged risk or operational chaos.

AI INTEGRATION FOR OPENSHIFT AI

Frequently Asked Questions

Common questions about extending Red Hat OpenShift AI with custom AI agents, model monitoring, and data scientist support workflows.

AI agents connect to OpenShift AI via its APIs and Kubernetes-native architecture, typically operating as sidecar containers or separate services within the same project namespace.

Typical Integration Points:

  1. JupyterHub API: Agents can be injected into notebook sessions to provide real-time code suggestions, dependency troubleshooting, or data exploration guidance based on the notebook's kernel and imported libraries.
  2. ModelMesh/KServe Inference Endpoints: Agents monitor inference logs, latency, and error rates. They can trigger automated scaling events or route traffic away from underperforming model servers.
  3. OpenShift AI Dashboard: Custom plugins or API calls can surface agent-generated insights—like recommended pipeline parameters or data drift alerts—directly in the data scientist's console.
  4. S3-Compatible Object Storage (for pipelines): Agents analyze pipeline artifacts and metrics stored in MinIO or cloud storage to suggest optimizations for subsequent runs.

Security & Permissions: Agents use ServiceAccounts with RBAC roles scoped to the project, ensuring they only access permitted resources like specific notebook pods or inference services.

Prasad Kumkar

About the author

Prasad Kumkar

CEO & MD, Inference Systems

Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.

His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.