Are We Ready for Large-scale AI Workloads?
Posted on May 22nd, 2023
Originally published in Embedded
ChatGPT has fired the world’s imagination about AI. The chatbot can write essays, compose music, and even converse in different languages. If you’ve read any ChatGPT poetry, you can see it doesn’t pass the Turing Test yet, but it’s a huge leap forward from what even experts expected from AI just three months ago. Over one million people became users in the first five days, shattering records for technology adoption.
The groundswell also strengthens arguments that AI will have an outsized impact on how we live—with some predicting AI will contribute significantly to global GDP by 2030 by fine-tuning manufacturing, retail, healthcare, financial systems, security, and other daily processes.
But the sudden success also shines light on AI’s most urgent problem: our computing infrastructure isn’t built to handle the workloads AI will throw at it. The size of AI networks grew by 10x per year over the last 5 years. By 2027 one in five Ethernet switch ports in data centers will be dedicated to AI, ML and accelerated computing.
Under these circumstances, large-scale AI becomes technically and economically impractical, if not impossible. It could also be terrible for the environment. Training a model like GPT-3 with 175 billion nodes can already consume 1,287 megawatt hours, enough to power around 120 U.S. homes for a year. A 10x increase in model performance—something that will occur—could translate to 10,000x increases in computational and energy needs.
To escape this spiral, we need to rethink computing architecture from the ground up. While it’s impossible to predict all the coming changes, some I assume will come are:
- Compute platforms get completely disaggregated. Each element of our systems—CPUs, GPUs, DPUs, memory, storage, networking, and so on—must be able to scale and improve at their own pace so innovation can keep up with algorithm demands and capacity/throughput requirements. That means eliminating interdependencies between them.
Memory is a clear example. Over the last few years memory has become a bottleneck for scaling performance. While the need for more bandwidth and capacity has constantly increased, it’s become almost impossible to scale the memory interface of a host anymore.
CXL technology, which is moving toward commercialization, lets you connect more memory to a processor via a CXL connection, bypassing the traditional PCIe based interface. CXL will also allow different processors and devices to share pools of supplemental memory. Data centers will also be able to recycle memory from older servers for creating CXL pools to optimize their TCO. The bottom line will be better resource utilization, higher peak performance, and a better ROI. Storage and networking are already disaggregated to some degree, but in the future, we’ll see a complete modularization of the data center with different functions and/or components in discrete appliances with dynamically changing relationships.
- Optical becomes the media. However, disaggregation creates latency, potential bandwidth bottlenecks and curbs performance. To enable disaggregation’s full potential, we need a media that can minimize these drawbacks.
As mentioned above, power consumption is a real problem. So is power density. We need to build larger and denser AI platforms to address emerging tasks and use cases. In many cases, connecting them electrically at the speeds and feeds demanded requires proximity, which leads to a power density problem. That limits our ability to add more AI components to a cluster and to scale further.
Optical is the only media that can address these points effectively.
Optical already connects racks together. In the next phase, optical will be deployed to connect assets within racks and then even inside system pods. Familiar protocols such as CXL will move to optical.
Here is something to illustrate the scale of AI platforms. Consider a current 25 Tbps Ethernet switch. Putting redundancy, radix, and topology aside for simplicity, this switch can accommodate roughly 500 servers connected at a typical 50 gigabits per second. But how many top-end GPUs using 3.6Tb/s (published as 900GB/s aggregated) connectivity to connect to peer GPUs in a cluster could the switch accommodate? Seven. So, the need for more bandwidth is obviously there. Copper switches will remain a thriving market and continue to advance, but optical will start to absorb high-end switching tasks.
The rise of PAM4 (Pulse Amplitude Modulation 4) and Coherent DSPs (Digital Signal Processors), which tightly focus the light inside optical cables for communication inside and between data centers, has also put the optical industry on a more predictable path of progress: optical is no longer an artisanal business like it was in its early days. 1.6T (200G lambda) optical modules, coming soon, will increase bandwidth while reducing component count, cost, and power, depending on the configuration and workload.
When and how optical technology gets integrated into chips remains a heated debate. Pluggable optical modules will remain the standard for general purpose workloads for the next several years for a host of reasons: steadily improving performance, an extensive ecosystem, customer choice, etc. Co-packaged solutions, however, may get deployed inside AI clusters at an earlier date. Reliability and performance for co-packaged optics still need to be proven, but the bandwidth, efficiency and power density gains potentially possible through co-packaged optics will encourage research, which in turn could lead to breakthroughs.
- AI training will be localized. Training a single model that knows everything and continues to get smarter all the time is questionable at scale.
A different approach may be to train a “generic” model in the cloud that has a generic capability, and then retrain it at the edge per the specifics of the area, the usage, the target audience, etc. Then, we can consider interconnecting all the optimized models between themselves to create this super-model that knows everything, just like the internet is composed of many websites. Potentially, this can be all transparent to the user.
An example from humans: a child picks up an ability to speak and interact with other humans right from day one. This is part of the human operating system that evolution has trained us with. Then, based on the surrounding local environment, the child “fine tunes” this inherit trained capability with the relevant language, knowledge, behavior, etc. Same human learning script can work with artificial machine learning, too.
Energy consumption and computing cycles would fall while consumer satisfaction would rise, getting better and more relevant responses.