Immersion cooling is a thermal management technique where AI server components are directly submerged in a non-conductive dielectric fluid. This method is essential for large-scale model training because it enables higher computational density and power efficiency than air or traditional liquid cooling. The fluid absorbs heat directly from GPUs, allowing racks to operate at power densities exceeding 50kW—far beyond the limits of air-cooled designs. You must choose between single-phase (fluid remains liquid) and two-phase (fluid boils and condenses) systems, each with distinct trade-offs in complexity, heat transfer efficiency, and cost.
Guide
How to Implement Immersion Cooling for Large-Scale Model Training

This guide provides a first-principles technical overview of implementing immersion cooling to manage the extreme thermal loads of multi-megawatt AI training clusters.
Implementation requires a systematic approach: first, select a compatible dielectric fluid like 3M Novec or Engineered Fluids. Next, design or procure immersion tanks that integrate with standard data center racks and facility infrastructure. Finally, establish maintenance procedures for fluid purity monitoring, component servicing, and leak detection. A successful deployment recycles waste heat, drastically reduces cooling energy use, and is a cornerstone of Sustainable Cloud Architecture and Liquid Cooling. For foundational concepts, see our guide on How to Design a Sustainable Cloud Architecture for AI Workloads.
Key Concepts: Single-Phase vs. Two-Phase Cooling
Choosing the right immersion cooling method is foundational for sustainable, high-density AI clusters. This decision impacts everything from capital costs to long-term operational efficiency.
Dielectric Fluid Selection Guide
The fluid is the lifeblood of your immersion system. Your choice dictates safety, performance, and total cost of ownership.
- Synthetic Hydrocarbons (Single-Phase): High flash point, good thermal conductivity, and lower cost. Example: Shell Immersion Cooling Fluid.
- Fluorinated Fluids (Two-Phase): Non-flammable, zero ozone depletion potential, but higher cost. Example: 3M Novec.
- Evaluation Criteria:
- Thermal Properties: Specific heat capacity, boiling point, viscosity.
- Material Compatibility: Will not degrade seals, cables, or PCB coatings.
- Environmental & Safety: Global Warming Potential (GWP), toxicity, biodegradability.
- Longevity & Stability: Resistance to thermal breakdown and oxidation.
Tank & Rack Integration Architecture
Immersion tanks replace traditional server racks. Design choices here determine serviceability and cluster scalability.
- Open Bath vs. Sealed Enclosures: Open baths allow easier hardware access but may have higher fluid evaporation. Sealed systems minimize fluid loss and contamination.
- Power Distribution: Submersible PDUs and waterproof connectors are mandatory. Plan for overhead busways or side-mounted power whips.
- Cabling Strategy: Use sealed penetrations for network and power. Plan for extra cable length and strain relief for lifting trays.
- Fluid Management: Include sight glasses, fill/drain ports, and fluid quality sensors (temperature, purity). Integrate with your facility's Building Management System (BMS).
Operational Procedures & Maintenance
Immersion cooling shifts maintenance from air filters to fluid systems. Establish these procedures before deployment.
- Hardware Service: Implement lift mechanisms for server trays. Technicians need aprons, gloves, and drip pans.
- Fluid Maintenance: Schedule regular sampling and analysis for acidity, moisture content, and particulate matter. Plan for filtration or fluid replacement cycles.
- Leak Detection & Response: Install leak detection sensors under tanks. Have spill containment berms and fluid recovery plans.
- Performance Monitoring: Track inlet/outlet fluid temperatures, flow rates, and pump power. Correlate this data with IT power draw to calculate real-time cooling efficiency.
Comparative Analysis: When to Choose Which
The right choice depends on your specific constraints and goals for sustainable cloud architecture.
- Choose Single-Phase If:
- You are retrofitting an existing facility with moderate power density (<30kW/rack).
- Your priority is lower capital expenditure and operational simplicity.
- You are comfortable with a slightly higher Power Usage Effectiveness (PUE).
- Choose Two-Phase If:
- You are building a greenfield AI cluster targeting extreme density (>40kW/rack).
- Your primary goal is minimizing energy-to-solution and achieving the lowest possible PUE (<1.02).
- You can manage higher fluid costs and more complex system engineering.
For a holistic view, see our guide on How to Design a Holistic Cooling Strategy for AI Hardware.
Step 1: Select Your Immersion Cooling System Type
Your first and most critical choice is between the two core immersion cooling architectures, which dictate your entire deployment's design, fluid selection, and operational model.
Immersion cooling submerges hardware directly in a dielectric fluid to capture heat. You must choose between single-phase and two-phase systems. Single-phase systems use a non-conductive liquid, like mineral oil or engineered fluids, that remains in a liquid state. Heat is removed as the fluid circulates through a heat exchanger. Two-phase systems use specialized fluids, such as 3M Novec, that boil at low temperatures, absorbing massive heat as they change phase from liquid to vapor, which is then condensed and returned.
Select single-phase for its operational simplicity, lower fluid cost, and easier maintenance—ideal for predictable, high-density racks. Choose two-phase for its superior heat transfer efficiency and ability to handle extreme, localized heat fluxes from components like GPUs, but be prepared for higher fluid costs and more complex system design. This choice directly impacts your tank design, fluid selection, and integration with facility cooling loops, as detailed in our guide on How to Implement Liquid Cooling in High-Density AI Data Centers.
Dielectric Fluid Comparison
Key properties and trade-offs for selecting a dielectric fluid for single-phase or two-phase immersion cooling systems in AI training clusters.
| Property / Metric | 3M Novec 7100 (Fluoroketone) | Engineered Fluids S5 (Synthetic) | Mineral Oil (Hydrocarbon) |
|---|---|---|---|
Dielectric Strength (kV) |
|
|
|
Global Warming Potential (GWP) | < 1 | < 5 | ~3 |
Boiling Point (°C) | 61 | 56 |
|
Thermal Conductivity (W/m·K) | 0.06 | 0.11 | 0.13 |
Material Compatibility | |||
Environmental Persistence | |||
Approx. Cost per Liter | $50-100 | $30-60 | $5-15 |
Typical Use Case | Two-phase, high-density racks | Single-phase, high-performance | Retrofit, low-cost POC |
Step 2: Design the Immersion Tank and Rack Integration
This step defines the physical and thermal interface between your AI hardware and the dielectric fluid, determining system efficiency, density, and serviceability.
The immersion tank is a sealed, corrosion-resistant vessel that holds the dielectric fluid and submerged servers. Design choices—like single-phase versus two-phase cooling—dictate the thermal transfer mechanism. For large-scale training, prioritize tanks that support standard 19" or Open Compute Project (OCP) racks to simplify hardware integration. The tank must include fluid inlets/outlets, vapor management for two-phase systems, and service ports for maintenance, forming the core of your Sustainable Cloud Architecture and Liquid Cooling system.
Rack integration requires designing or procuring immersion-ready servers with sealed connectors and compatible materials. Plan the rack-level power distribution and external networking passthroughs before submersion. A critical best practice is to implement a dry-run validation of all hardware and cabling outside the tank. This prevents costly fluid contamination and ensures the rack design supports the required computational density for your model training jobs without thermal throttling.
Enabling Efficiency, Speed & Accuracy
Intelligent Analysis, Decision & Execution
We build AI systems for teams that need search across company data, workflow automation across tools, or AI features inside products and internal software.
Talk to Us
Search across company data
Give teams answers from docs, tickets, runbooks, and product data with sources and permissions.
Useful when people spend too long searching or get different answers from different systems.

Automate internal workflows
Use AI to route work, draft outputs, trigger actions, and keep approvals and logs in place.
Useful when repetitive work moves across multiple tools and teams.

Add AI to products and internal tools
Build assistants, guided actions, or decision support into the software your team or customers already use.
Useful when AI needs to be part of the product, not a separate tool.
Common Mistakes
Implementing immersion cooling for AI training clusters is a high-stakes engineering project. Avoiding these common pitfalls is critical for achieving the promised power efficiency, reliability, and total cost of ownership.
Single-phase immersion uses a dielectric fluid that remains in a liquid state. Heat is transferred via convection as the fluid circulates, typically with a pump, and is then rejected via a heat exchanger. It's simpler and often chosen for its operational familiarity.
Two-phase immersion uses a fluid with a low boiling point. The fluid boils directly on hot components (like GPUs), absorbing latent heat. The vapor condenses on a cooled coil above the tank, creating a highly efficient, passive circulation loop. It offers superior heat transfer but requires precise tank pressure management and fluid handling.
Choosing the wrong type is a foundational mistake. Single-phase is often better for predictable, high-flow scenarios. Two-phase excels at handling extreme, uneven heat fluxes common in large-scale model training but adds system complexity.

About the author
Prasad Kumkar
CEO & MD, Inference Systems
Prasad Kumkar is the CEO & MD of Inference Systems and writes about AI systems architecture, LLM infrastructure, model serving, evaluation, and production deployment. Over 5+ years, he has worked across computer vision models, L5 autonomous vehicle systems, and LLM research, with a focus on taking complex AI ideas into real-world engineering systems.
His work and writing cover AI systems, large language models, AI agents, multimodal systems, autonomous systems, inference optimization, RAG, evaluation, and production AI engineering.
Partnered with leading AI, data, and software stack.
How We Work
Custom AI workflows for your Business
One-fit-all AI don't work for modern businesses. At Inferensys, we aim to understand your business & custom requirements; which we use to define most efficient agentic workflows, the data, and the tools for your business.
01
Review the use case
We understand the task, the users, and where AI can actually help.
Read more02
Pick the right approach
We define what needs search, automation, or product integration.
Read more03
Build the first useful version
We implement the part that proves the value first.
Read more04
Improve from there
We add the checks and visibility needed to keep it useful.
Read moreThe first call is a practical review of your use case and the right next step.
Talk to Us