Nvidia introduces Spectrum-4 platform for AI, HPC over Ethernet

Nvidia is known for its GPUs, but has introduced Spectrum-4, a combination of networking technologies that reinforces its commitment not only to graphics processors, but also to systems designed to handle the demanding network workloads of AI and high-performance computing.

The latest Nvidia Spectrum products rely on the new Spectrum-4 Ethernet-switch ASIC that boasts 51.2 Tb/s switching and routing capacity. The chip underpins the latest members of the company’s Spectrum switches, which are available later this year. The switches are part of a larger Spectrum-4 platform that integrates Nvidia’s ConnectX-7 smartNIC, its new BlueField-3 DPU, and its DOCA software-development platform.

The company introduced Spectrum-4 at its GPU Technology Conference this week.

The Spectrum-4 SN5000 Ethernet switch family can support 128 ports of 400GbE, combined with adaptive routing and enhanced congestion control mechanisms to optimize RDMA over Converged Ethernet (RoCE) fabrics.

It will handling massive data sets, such as those needed for modeling entire cars, entire factories, and even the entire Earth for weather modeling, said Kevin Deierling, vice president of networking at Nvidia. Ethernet wasn’t designed for these massive data sets; it was designed for small packet exchanges, or what Nvidia calls “mice flows,” he said. Giant data sets for HPC and AI are what he referred to as “elephant flows” that can overwhelm traditional Ethernet architectures.

Today, due to legacy designs and inefficiency of ordinary Ethernet networking, these flows can end up on the same connection, causing congestion, degraded performance, and even dropped packets.

Spectrum-4 is meant to address the problem. “What we’ve done is built an accelerated Ethernet fabric that actually can accommodate these elephant flows, and manage them efficiently,” he said.

Nvidia’s answer is adaptive routing that looks at congestion within the network and directs traffic accordingly to handle massive workloads more effectively, he said. “So we call it accelerated Ethernet networking.”

Spectrum-4 switches also allow nanosecond timing precision, which Nvidia says is an improvement of five to six orders of magnitude compared to typical, millisecond-based data centers. They also accelerate, simplify, and secure the network fabric with 2x faster per-lane bandwidth, 2x more ports, 4x fewer switches and 40% lower power consumption compared to the previous generation.

Deierling said he thinks there is a potential consolidation story here, but that organizations will also likely expand their networking infrastructure. “A good example is some of our platforms today have nine of our Connect-X adapters inside of them. So those are delivering something like 1.8 terabits per second. Now all of a sudden, we can do that same level of bandwidth with four adapters. And so you’ll see smaller boxes, you’ll see consolidation, but I also think you’ll see people scale out because they want to continue and double the performance,” he said.

While Spectrum-4 is meant for HPC and AI, Deierling said it has broader application. “Even for traditional databases today, you want to perform Big Data analytics. Anybody who’s really running their business in a thoughtful manner is embracing Big Data and data analytics. And you just need to move a ton of data, whether it’s an AI application, or a data analytics application, you should be harvesting and using all of your data and synthesizing into business intelligence,” he said.

And the good news is using NVIDIA SDKs means existing applications will not require modification to take full advantage of the performance benefits.

Spectrum-4 switch systems and BlueField-3 DPU and systems will be available later this year. ConnectX-7 is available now. Early access to the DOCA software-development platform that is available now.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Source