AI startup SambaNova updates processor, software

SambaNova Systems, maker of dedicated AI hardware and software systems, has launched a new AI chip, the SN40, that will be used in the company’s full-stack large language model (LLM) platform, the SambaNova Suite.

First introduced in March, the SambaNova Suite uses custom processors and operating systems for AI inference training. It’s designed to be an alternative to power-hungry and expensive GPUs.

To upgrade the hardware so soon after launch means that there ought to be a big jump in performance, and there is. The SN40L serves up to a 5 trillion parameter LLM with 256K+ sequence length possible on a single system node, according to the vendor.

Each SN40L processing unit is made up of 51 billion transistors (total 102 billion per package), which is a significant increase over the 43 billion transistors in the previous SN30 product. The SN40L also uses 64 GB of HBM memory, which is new to the SambaNova line, and offers more than 3x greater memory bandwidth to speed data in and out of the processing cores. It has 768 GB of DDR5 per processing unit (1.5 TB total) vs. 512 GB (1.0 TB) in the SN30.

SambaNova’s processor is different from Nvidia’s GPU in that it offers a RDU-based (reconfigurable dataflow unit) environment, which is reconfigurable on-demand, almost like an FPGA. This is helpful when enterprises start dealing with multimodal AI, where they are shifting between different inputs and outputs.

On the software side, SambaNova is offering what it calls a turnkey solution for generative AI. SambaNova’s full AI stack includes pre-trained, open-source models such as the Meta Llama2 LLM model, which organizations can modify with their own content to build their own internal LLM. It also includes the company’s SambaFlow software, which automatically analyzes and optimizes processing based on the needs of the particular tasks.

Dan Olds, chief research officer at Intersect360 Research, said this is a major upgrade both in terms of hardware and, as importantly, the surrounding software stack. He notes that the 5 trillion parameter limit of the SN40 is nearly three times larger than the 1.7 trillion parameter estimated size of GPT-4.

“The larger memory, plus the addition of HBM, are key factors in driving the performance of this new processor. With larger memory spaces, customers can get more of their models into main memory, which means much faster processing. Adding HBM to the architecture allows the system to move data between main memory and the cache-like HBM in much larger chunks, which also speeds processing,” said Olds.

The ability to run much larger models in relatively small systems and to run multiple models simultaneously with high performance, plus the integration of open-source LLMs to help customers get off the ground quickly with their own generative AI projects, mark a big step forward for SambaNova, Olds said.

“It gives them hardware that can truly compete with GPU-based systems on large models and a suite of software that should take a lot of the mystery (and time) out of building a custom LLM for end users,” he said.

Next read this:

Source