Copper Interconnect

A growing constraint for AI data centers

The rapid rise of artificial intelligence (AI) has revolutionized industries and transformed data processing. However, this growth comes with unique challenges, particularly in the design and operation of data  centers. One of the most pressing constraints is the reliance on copper interconnects for communication between GPUs and other components. As AI workloads demand increasing computational power and faster interconnects, copper-based solutions are facing limitations that are reshaping datacenter infrastructure, particularly with respect to rack density and power consumption.

The Role of Copper Interconnects in AI Data centers

Copper interconnects have long been the backbone of datacenter communication, facilitating data transfer between GPUs, CPUs, and memory within and across servers. AI workloads, especially those involving large-scale deep learning models, require massive amounts of data to be exchanged at high speed with minimal latency. This places immense stress on traditional copper interconnects.

The physical properties of copper limit its efficiency as data transfer speeds increase. At higher frequencies, signal attenuation and crosstalk become significant, requiring additional power and advanced signal processing techniques to maintain integrity. Furthermore, copper interconnects are inherently limited by distance; as the length of the connection increases, so does the signal degradation. This constraint is particularly problematic in modern AI data centers, where GPUs must work in tightly coupled clusters to maximize parallel processing capabilities.

Zaid Kahn, Vice President of Cloud AI & Advanced Systems Engineering, Microsoft

Increasing Rack Density

To overcome the communication bottlenecks of copper interconnects, data centers are packing more GPUs into each rack. This high-density approach reduces the physical distance between interconnected GPUs, mitigating some of the issues associated with copper’s signal degradation. By minimizing the length of copper traces, data centers can achieve lower latency and higher bandwidth connections, essential for AI training and inference tasks.

This story is part of a paid subscription. Please subscribe for immediate access.

Subscribe Now
Already a member? Login here
Already a member? Log in here

ABOUT THE AUTHOR

Zaid Kahn is Vice President of Microsoft’s Silicon, Cloud Hardware, and Infrastructure Engineering organization. He leads systems engineering and hardware development for Azure, including AI systems, compute, memory, and infrastructure. Kahn’s teams are responsible for software and hardware engineering efforts as well as specialized compute systems, FPGA network products, and ASIC hardware accelerators. Kahn is part of the technical leadership team across Microsoft that sets AI hardware strategy for training and inference. His team is also responsible for the development of Microsoft’s systems for MAIA and Cobalt custom silicon.

Prior to joining Microsoft, Kahn was the head of infrastructure at LinkedIn. He was responsible for all aspects of architecture and engineering for data centers, networking, computer, storage, and hardware. Kahn also led several software development teams focusing on building and managing infrastructure as code. The network teams Kahn led built the global network for LinkedIn, including POPs, peering for edge services, IPv6 implementation, DWDM infrastructure, and data center network fabric.

Kahn holds several patents in networking and is a sought-after keynote speaker at top-tier conferences and events. He is also currently Chair of Open Compute Foundation (OCP), EECS External Advisory Board (EAB) at UC Berkeley, and a board member of Internet Ecosystem Innovation Committee (IEIC), a global Internet think tank promoting Internet diversity. Kahn has a Bachelor of Science in Computer Science and Physics from the University of the South Pacific.