Seldon Core is an open-source model serving platform designed to deploy, manage, and monitor machine learning models on Kubernetes. It transforms trained models from frameworks like TensorFlow or PyTorch into scalable microservices with REST and gRPC APIs. The platform supports complex inference graphs, allowing models to be chained with pre- and post-processing steps, business logic, and transformers into a single, deployable unit. This enables sophisticated model pipelines and advanced deployment strategies like canary rollouts directly within the Kubernetes ecosystem.
Glossary
Seldon Core

What is Seldon Core?
Seldon Core is an open-source, Kubernetes-native platform for deploying, scaling, and managing machine learning models in production.
A core feature is its support for advanced deployment patterns, including A/B testing, multi-armed bandits, and shadow deployments, managed through declarative Kubernetes custom resources. It provides built-in model monitoring for metrics like throughput and latency, and integrates with explainability toolkits. By abstracting the operational complexity, Seldon Core allows MLOps engineers to focus on model performance rather than infrastructure, ensuring reliable, scalable, and observable inference services in production environments.
Key Features of Seldon Core
Seldon Core is an open-source platform for deploying, managing, and monitoring machine learning models on Kubernetes. It enables complex inference graphs and advanced deployment strategies for production ML workloads.
Complex Inference Graphs
Seldon Core enables the orchestration of sophisticated inference pipelines, known as Seldon Deployments. These are not single models but directed acyclic graphs (DAGs) where nodes can be:
- Models: Trained ML models (TensorFlow, PyTorch, SKLearn, etc.).
- Transformers: Pre- and post-processing components for feature engineering.
- Routers: Logic for A/B testing, multi-armed bandits, or ensemble methods.
- Combiners: Aggregators that merge outputs from multiple model branches. This allows engineers to build production-grade pipelines that mirror real-world business logic, such as chaining a fraud detection model with a recommendation model.
Advanced Deployment Strategies
The platform provides enterprise-grade rollout patterns directly on Kubernetes, abstracting away complex YAML configurations. Key strategies include:
- Canary Rollouts: Safely deploy a new model version to a small percentage of traffic (e.g., 5%) to monitor performance before full promotion.
- Shadow Deployment: Send a copy of live traffic to a new model for comparison without affecting user-facing predictions.
- Multi-Armed Bandit Testing: Dynamically route traffic to the best-performing model variant based on real-time feedback, optimizing for business metrics. These strategies are managed through a custom Kubernetes SeldonDeployment resource, providing a declarative API for ML ops.
Built-in Explainability & Outlier Detection
Seldon Core integrates directly with the Alibi and Alibi Detect open-source libraries to provide production-ready explainability and monitoring.
- Explainers: Generate feature attribution explanations (e.g., SHAP, Anchor, Integrated Gradients) for individual predictions to debug model behavior.
- Outlier Detectors: Identify anomalous inference requests that fall outside the model's trained data distribution using methods like Mahalanobis distance or Isolation Forests. These components can be added as sidecar containers to any model in the inference graph, providing actionable insights without modifying the core model code.
Rich Metrics & Observability
The platform automatically exposes a comprehensive set of metrics for monitoring model performance and infrastructure health.
- Request-Level Metrics: Latency, throughput, and 4xx/5xx error rates for each model in a graph.
- Custom Business Metrics: Users can emit arbitrary metrics (e.g., prediction value distributions) from within model code.
- Integration with Prometheus & Grafana: All metrics are exposed in Prometheus format, enabling seamless integration with standard Kubernetes monitoring stacks.
- Distributed Tracing: Supports OpenTelemetry for tracing requests as they flow through complex inference graphs, identifying latency bottlenecks.
Protocol Flexibility & gRPC Optimization
Seldon Core serves models over multiple protocols to suit different client needs and optimize performance.
- REST/gRPC Endpoints: Automatic generation of both RESTful JSON APIs and high-performance gRPC endpoints from the same model packaging.
- gRPC for High Throughput: The gRPC interface, using protocol buffers, is significantly faster for high-volume inference, reducing serialization overhead.
- Seldon Prediction Protocol: A consistent internal wire format that standardizes payloads between components in an inference graph, regardless of the original model framework.
Kubernetes-Native Architecture
Seldon Core is designed as a set of Kubernetes operators and custom resource definitions (CRDs), making it a natural extension of the Kubernetes ecosystem.
- SeldonDeployment CRD: The primary resource for defining inference graphs, leveraging Kubernetes for lifecycle management.
- Operator Pattern: Controllers automatically reconcile the desired state, managing the creation of Kubernetes Deployments, Services, and Horizontal Pod Autoscalers.
- Integration with Service Meshes: Works seamlessly with Istio or Linkerd for advanced traffic management, security (mTLS), and observability features at the service mesh layer.
How Seldon Core Works
Seldon Core is an open-source, Kubernetes-native platform for deploying, scaling, and managing machine learning models in production.
Seldon Core deploys models as Kubernetes Custom Resources, wrapping them in standardized inference servers called Seldon Servers. These servers expose a consistent REST or gRPC API, abstracting the underlying model framework (e.g., TensorFlow, PyTorch). The platform orchestrates these servers as pods, managing their lifecycle, networking, and load balancing automatically through the Kubernetes control plane. This provides a resilient, scalable foundation for online inference and batch inference workloads.
Its core innovation is the Seldon Deployment specification, which allows users to define complex inference graphs as directed acyclic graphs (DAGs). These graphs can chain multiple models, preprocessing and postprocessing components, and routers into a single deployable unit. This enables advanced patterns like A/B testing, multi-armed bandits, and ensembles. The platform integrates with service meshes like Istio for fine-grained traffic management, supporting canary deployments and blue-green deployments with minimal operational overhead.
Seldon Core vs. Other Model Serving Platforms
A technical comparison of core capabilities across leading open-source and commercial platforms for deploying machine learning models on Kubernetes.
| Feature / Capability | Seldon Core | KServe | NVIDIA Triton Inference Server | Proprietary Cloud Endpoints (e.g., SageMaker, Vertex AI) |
|---|---|---|---|---|
Core Architecture | Kubernetes-native operator for custom resources | Kubernetes-native standard (Knative/Istio) | Standalone inference server (containerized) | Managed cloud service with proprietary APIs |
Multi-Framework Support | True (via prepackaged servers or custom containers) | True (via pre-defined ServingRuntimes) | True (native support for TensorFlow, PyTorch, ONNX, etc.) | True (often with framework-specific containers) |
Complex Inference Graphs/Pipelines | True (native composition via Seldon CRDs) | True (via InferenceGraph spec) | True (via Ensemble models) | Limited (often chaining via separate services) |
Advanced Deployment Strategies | True (Canary, Shadow, A/B, Multi-Armed Bandit) | True (Canary, Blue-Green via Istio) | False (requires external orchestrator) | True (managed canary, A/B via cloud console) |
Built-in Explainability (e.g., LIME, SHAP) | True (integrated Alibi explainers) | False (requires manual integration) | False (requires manual integration) | False (proprietary or manual integration) |
Built-in Outlier/Drift Detection | True (integrated Alibi detect) | False | False | Limited (proprietary cloud monitoring) |
Request/Response Logging & Metrics | True (automatic Prometheus/Grafana) | True (via Knative/Istio metrics) | True (Prometheus metrics endpoint) | True (cloud-native monitoring dashboards) |
GPU Support & Multi-Model Batching | True (via underlying inference server) | True (via underlying inference server) | True (native dynamic batching, concurrent models) | True (managed batching, instance types) |
Model Orchestration & Lifecycle | Kubernetes Operator (GitOps friendly) | Kubernetes Custom Resource Definitions | Model Repository polling, REST/gRPC management APIs | Fully managed (proprietary CLI/UI) |
Vendor Lock-in Risk | False (CNCF sandbox, pure Kubernetes) | False (open standard, part of Kubeflow) | Low (open-source, but NVIDIA-optimized) | True (proprietary APIs, cloud-specific features) |
Enterprise Auth & Security (mTLS, RBAC) | True (via Istio integration) | True (via Istio integration) | Partial (supports auth extensions) | True (cloud IAM, VPC, private endpoints) |
Cost Model for Core Platform | $0 (open-source) | $0 (open-source) | $0 (open-source) | Variable (per-hour endpoint + prediction costs) |
Frequently Asked Questions
Seldon Core is an open-source platform for deploying, managing, monitoring, and explaining machine learning models on Kubernetes. These FAQs address its core mechanisms, architecture, and role in production ML systems.
Seldon Core is an open-source Kubernetes-native platform for deploying, scaling, and managing machine learning models as microservices. It works by packaging models, their dependencies, and any required pre/post-processing code into a standardized inference server container. These containers are then orchestrated by Seldon's custom resource definitions (CRDs) on Kubernetes, which handle lifecycle management, traffic routing, and integration with the broader cloud-native ecosystem (e.g., Istio for networking, Prometheus for metrics). The platform abstracts the complexity of Kubernetes, allowing data scientists to define complex inference graphs—sequential, ensemble, or branching pipelines of models—through simple YAML configurations.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Related Terms
Seldon Core operates within a broader ecosystem of tools and concepts for deploying and managing machine learning models in production. These related terms define the architectural patterns and components it interacts with.
Inference Graph
A directed acyclic graph (DAG) that defines a multi-step prediction pipeline. This is a core abstraction in Seldon Core. A graph can include:
- Pre-processing transformers
- One or more model predictors
- Post-processing transformers
- Routers for A/B tests
- Combiners to merge outputs This allows you to build complex, modular inference workflows (e.g., a recommender that first filters, then ranks, then explains) as a single deployable Kubernetes resource.
Canary Deployment
A release strategy where a new version of a model is deployed to a small, controlled subset of live traffic (e.g., 5%). Seldon Core implements this natively using its traffic splitting capabilities. You can:
- Specify the percentage of requests routed to the new (canary) model.
- Monitor its performance and business metrics.
- Gradually increase traffic or roll back based on results. This minimizes risk by validating new models in production before a full rollout.
Model Orchestrator
A system component responsible for managing the lifecycle, scheduling, and coordination of multiple model serving instances. Seldon Core acts as a model orchestrator on Kubernetes. Its responsibilities include:
- Translating high-level deployment specs into Kubernetes objects.
- Managing rolling updates and rollbacks.
- Scaling replicas up/down based on load.
- Ensuring health and readiness of model servers. It abstracts away the complexity of managing individual pods and services.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us