Family Encyclopedia >> Electronics

Argonne's Polaris Supercomputer: AMD and NVIDIA Bridge to Intel's Delayed Aurora

A supercomputer delivers peak computing performance for demanding engineering and scientific workloads, handling massive datasets and complex calculations. Due to delays in Intel's Aurora supercomputer caused by manufacturing issues with its Sapphire Rapids server chips, the U.S. Department of Energy's Argonne National Laboratory has selected NVIDIA's A100 GPUs paired with AMD EPYC processors for its new Polaris system.

Argonne National Laboratory

Operated by the U.S. Department of Energy, Argonne National Laboratory (ANL) is the Midwest's largest national lab for science and engineering research, located near Chicago. It is pioneering the path to exascale computing with Polaris, serving as a critical precursor to Aurora.

Performance

Polaris features 560 nodes, each with two AMD EPYC processors and four NVIDIA A100 GPUs, totaling 2,240 GPUs. It delivers up to 44 PetaFLOPS of FP64 performance—positioning it for a top-10 spot on the TOP500 list—while scaling to 1.4 AI ExaFLOPS (measured in lower-precision formats optimized for AI). Though not a full exascale system like Aurora's targeted 1 ExaFLOP in FP64 by late 2022 or early 2023, Polaris marks a significant NVIDIA-powered milestone.

Features

Polaris will accelerate breakthroughs in particle physics, clean energy exploration, and cancer research. It propels Argonne's Leadership Computing Facility (ALCF) into the exascale AI era, enabling researchers to refine workflows for the upcoming Aurora system.

Design Specifications

Polaris consumes about 2 megawatts at peak, far less than Aurora's 60 megawatts, and uses 560 nodes versus Aurora's 9,000. Built by Hewlett Packard Enterprise (HPE) with Cray Slingshot-11 networking, it starts with AMD EPYC Rome 7532 32-core CPUs, upgrading to Milan 7543 in March 2022. Deployed across 40 HPE Apollo 10 Gen10 racks, it mirrors Aurora's interconnect for seamless code porting.

Intel's delays stem from challenges with its Sapphire Rapids (Intel 7 process) and Ponte Vecchio GPUs, originally planned for a 2018 Aurora debut as a 180-PetaFLOP system with Xeon Phi processors—later canceled. Recent advancements, like the Tile-Xe feature for inter-chip communication, were developed at DOE's request but pushed timelines to 2022-2023, allowing AMD's Frontier to claim the first exascale crown.

Polaris's GPU-CPU hybrid design eases Argonne's transition to Aurora despite Intel roadmap slips, supporting DOE's Exascale Computing Project and ALCF programs in code optimization.

Advantages

  • Advances cancer research through data-driven analysis of tumor growth, fluid-structure simulations, and prediction of drug responses across billions of molecular combinations.
  • Expands physical sciences via ATLAS experiments at CERN's Large Hadron Collider, the world's most powerful particle accelerator near Geneva, Switzerland.

Final Verdict

Polaris combines multi-GPU nodes, Slingshot interconnects akin to Aurora, and optimized support for Python, HPC simulations, machine learning, and data analytics via HPE and NVIDIA technologies. Currently in final deployment, it will support early science runs in 2022, with broader access in Q2 2022.