Amazon Builds Custom Cooling Tech to Handle Nvidia’s Powerful AI Chips
AWS officially announced the general availability of its new P6e instances, powered by Nvidia’s Blackwell GPUs and utilizing the custom IRHX cooling system, on July 9, 2025. These instances, specifically the P6e-GB200 UltraServers, are designed to deliver the highest GPU-based AI training and inference performance in Amazon EC2.
In the fast-paced world of artificial intelligence, heat is becoming a serious problem — literally. As Nvidia’s ultra-powerful GPUs (graphics processing units) drive the generative AI revolution, keeping these chips cool has become a critical challenge. Now, Amazon Web Services (AWS) has come up with a smart in-house solution to tackle it head-on.
Amazon Web Services (AWS) has come up with a smart in-house solution to tackle it head-on.
Instead of relying on third-party equipment or building entirely new liquid-cooled data centers, Amazon’s engineers designed their own hardware called the In-Row Heat Exchanger (IRHX).
This system can slot into existing or new data centers and cool the high-density Nvidia GPUs efficiently, without taking up too much space or increasing water usage dramatically.
With the new cooling system in place, AWS is now offering P6e instances powered by Nvidia’s Blackwell GPUs.
Better cooling means better performance, longer hardware life, and more cost-efficient AI training for developers and businesses alike.
By designing its own gear, Amazon becomes less dependent on third-party vendors — and it’s clearly working.
In Q1 of 2025, AWS posted its highest operating margin since 2014, contributing significantly to Amazon’s overall profits.
With the explosion of AI workloads, especially large language models and other compute-heavy tasks, traditional air cooling systems are struggling to keep up. Nvidia’s next-gen GPUs, like the ones in the new GB200 NVL72 racks (which cram 72 GPUs into a single rack!), generate an enormous amount of heat.
Many cloud giants — including Microsoft and CoreWeave — have adopted liquid cooling technologies to keep things running smoothly. But AWS decided to take a different path.
The In-Row Heat Exchanger (IRHX)
Instead of relying on third-party equipment or building entirely new liquid-cooled data centers, Amazon’s engineers designed their own hardware called the In-Row Heat Exchanger (IRHX). This system can slot into existing or new data centers and cool the high-density Nvidia GPUs efficiently, without taking up too much space or increasing water usage dramatically.
“They would take up too much data center floor space or increase water usage substantially,” explained Dave Brown, AWS VP of Compute and Machine Learning Services.
This in-house approach not only saves time but also ensures AWS can scale the solution to meet its massive global infrastructure needs.
New P6e Instances: Now Available
With the new cooling system in place, AWS is now offering P6e instances powered by Nvidia’s Blackwell GPUs. These machines are tailor-made for training and running large-scale AI models, bringing high performance and efficient cooling under one roof.
And if you’re wondering why this matters — better cooling means better performance, longer hardware life, and more cost-efficient AI training for developers and businesses alike.
Why Amazon Does It Their Way
This isn’t the first time AWS has built its own hardware. The cloud giant has also developed:
Custom AI chips (like Trainium and Inferentia)
Storage servers
Networking routers
By designing its own gear, Amazon becomes less dependent on third-party vendors — and it’s clearly working. In Q1 of 2025, AWS posted its highest operating margin since 2014, contributing significantly to Amazon’s overall profits.
The Cloud Wars Continue
Microsoft, Amazon’s top cloud rival, isn’t sitting idle. Microsoft is developing its own AI chips (Maia) and custom cooling systems (called Sidekicks) to keep up with rising compute demands.
With AI workloads growing every day, the race to build smarter, more efficient data centers is heating up — and AWS is proving it’s ready to lead the charge with both brains and cooling muscle.
What it will do and its New Features
The IRHX system is a sophisticated cooling solution that combines elements of both liquid and air-based cooling. It functions by circulating chilled liquid directly to the vicinity of the server rows, where it efficiently extracts heat from the tightly packed Nvidia GPUs. This direct-to-chip approach, combined with fan-coil arrays that maintain the air-cooled mechanical layout of standard AWS racks, ensures optimal performance.
Key features and benefits of this innovation include:
Efficient Heat Dissipation: The IRHX is specifically engineered to handle the extreme heat generated by Nvidia’s next-gen GPUs, such as those in the GB200 NVL72 racks, which cram a staggering 72 GPUs into a single rack. This ensures the chips operate at optimal temperatures, preventing thermal throttling and maximizing performance.
Space and Water Efficiency: According to Dave Brown, AWS VP of Compute and Machine Learning Services, traditional liquid cooling solutions often demand excessive data center floor space or lead to substantial increases in water usage. The IRHX circumvents these limitations, offering a more compact and water-conscious approach.
Scalability and Adaptability: Designed by AWS engineers, the IRHX can be deployed rapidly and scaled across AWS’s massive global infrastructure. Its ability to slot into both existing and new data centers provides significant flexibility in meeting evolving AI demands.
Enhanced Hardware Life and Cost-Efficiency: By effectively managing heat, the IRHX contributes to longer hardware lifespan for the expensive Nvidia GPUs. For developers and businesses, this translates into more stable and cost-efficient AI training, as hardware failures due to overheating are minimized.
Why it Matters
This in-house cooling solution is a critical development for several reasons:
Unlocking AI Potential: As AI models grow exponentially in complexity and size, the computational power required to train and run them continues to surge. Effective cooling is no longer a mere operational detail; it’s a fundamental enabler for unlocking the full potential of these advanced AI capabilities.
Competitive Advantage: In the highly competitive cloud computing landscape, custom hardware development is becoming a key differentiator. By designing its own cooling infrastructure, AWS reduces its reliance on third-party vendors, gains greater control over its supply chain, and can optimize its systems specifically for its demanding AI workloads. This vertical integration has historically paid dividends for Amazon, with AWS posting its highest operating margin since 2014 in Q1 2025.
Sustainability: Efficient cooling systems contribute to the overall sustainability of data centers by reducing energy consumption and water usage. AWS’s focus on a water-efficient solution aligns with broader industry efforts towards more environmentally responsible computing.
Future-Proofing Infrastructure: The rapid evolution of AI chips means that cooling needs are constantly changing. By designing its own solutions, AWS can adapt more quickly to these advancements, ensuring its infrastructure remains at the forefront of AI innovation.
Read More : Amazon Builds Custom Cooling Tech to Handle Nvidia’s Powerful AI Chips












