The 2024 AI HW Summit: Here’s What Caught My Attention

Contributor

Founder and Principal Analyst, Cambrian-AI Research LLC

Sep 18, 2024,02:26pm EDT

Updated Sep 19, 2024, 09:18am EDT

The Summit drew over 1000 attendees this year, with scores of presentations and hundreds of AI leaders from large companies as well as many startups.

Every September since 2019, the AI HW Summit in the Bay Area has been the focal point for new technologies around AI. While the event, hosted by UK-based Kisaco, started with semiconductors, it has continually expanded its focus to include software, models, networking, and full data center optimization. Next year it will return as the AI Infra Summit, acknowledging that AI has become a full-stack endeavor that consumes entire data centers.

It may surprise some that Nvidia did not present at the event. They don’t see the need, since everyone knows who they are and how fast their GPUs are.

Here’s a few insights from the event.

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Seems like the battle over inference services is really heating up, with Cerebras, Groq, and Samba Nova all claiming to be the fastest available tokens-as-a-service. Now, I am fairly confident that nobody is lying here, but let’s just say each company is cherry-picking the size of the Llama3.1 model they want to tout. And they are mostly referencing tests run by Artificial Analysis, which has results on its website.

Here’s Cerebras’ benchmark results:

Cerebras claims it is the fastest for inference using Llama3.1-8 and 70B

Artificial Analysis

And Samba Nova claims the fastest 405B parameter Llama 3.1. There were a lot of discussions on why Groq and Cerebras has not (yet) run this model. It could be that they don’t have enough SRAM on their systems to do as well. Or they just don’t have enough time. (note: OctoAI is reportedly in acquisition discussion now with Nvidia.)

MORE FROMFORBES ADVISOR

Best High-Yield Savings Accounts Of 2024

Kevin Payne

Contributor

Best 5% Interest Savings Accounts of 2024

Cassidy Horton

Contributor

Samba Nova claims THEY are the fastest....

Samba Nova

And here are Groq’s results. Groq seems to be picking up steam, and landed $640M investment from Blackrock and others. The company’s development cloud has quickly grown to over 360,000 developers building AI on GroqCloud. Groq also landed a large data center deal with Aramco to build a giant data center in Saudi Arabia that could grow to some 200,000 Language Processing Units.

But in larger models like the Llama 3.1 70B. Groq claims leadership

Artificial Analysis

So, I went to the AA website (no, not THAT AA, though perhaps I should) and found a very interesting chart that proclaims Cerebras the winner of 70B in performance and price per million tokens at just under 50 cents.

As you can see, Cerebras wins the 70B Instruct model performance crown and the best price per 1M ... [+] tokens.

Artificial Analysis

Confused? So am I. But here’s the deal. Artificial Analysis runs a wide variety of models on whatever hardware a model service provider uses. They don’t do any tuning, and Nvidia is only represented by the providers who use an unspecified Nvidia GPU. They don’t disclose how many accelerators were used in the runs, nor which lower-level software was used.

Inference will become a larger market than training AI, and all three of these companies have demonstrated a massive leap forward in lowering the costs of using AI in real-world applications. Nice job! Now, if you could just publish some MLPerf results, we’d all feel better. AA provides a great service, but does not replace benchmarks run by the hardware providers themselves such as MLPerf, whose benchmarks are all peer-reviewed prior to release.

Optical is the Next Big Thing

How many times have we heard that? Its always coming “soon”. Yes, optical interconnects are widely used for rack-to-rack connectivity in modern data centers to get around the length limitations of copper and the need for retimers. But optical is rarely used within a rack, where the cable lengths are not a problem for the cheaper copper solutions.

The Celestial optical fabric can be used for system to HBM memory and System to System connectivity. ... [+]

Celestial AI

But that may be about to change. Celestial AI is developing an elegant and performant design they were touting at the conference. Their approach could help solve the “memory wall” GPUs contend with today, by providing access to over 33 TB of shared HBM memory space. They claim they can lower costs by over 25 times, power by 8 times, and RDMA latency by 5 times, all while providing over 4 times better bandwidth. We will be watching these guys closely as they finish engineering their 1st generation.

Celestial AI's Fabric can reduce costs by over 20-fold, power by 8-fold, and latency by 5-fold ... [+] according to the company.

Celestial AI

What Ever Happened to Analog Computing?

There is a lot of research going on at IBM, Intel, and elsewhere to develop a performant analog in-memory compute solution. It looks great in PowerPoint, but the D-to-A converters add latency, and the size of memory is not conducive to running the LLMs that are driving billions of dollars of investment these days.

Enter Mentium, a startup out of UC Santa Barbara, that is building a platform that combines a digital processor with an in-memory-compute analog processor they believe provides the best of both worlds.

Mentium combines an analog processor for lage kernels with a digital processor for oft-used kernels ... [+] for better power efficiency in Edge AI.

Mentium

As an important aside, Mentium switched from in-house EDA tool hosting to the Synopsys Cloud, hosted on Microsoft Azure. The switch save the company months of development time and costs, while reducing the complexity they were facing using on-prem EDA tools.

Synopsys offers its cloud-based EDA development environment to help startups design chips. including ... [+] Mentium.

Synopsys

The Mega-NIC From Enfabrica Is Coming Soon

One of Nvidia’s greatest assets is NVLink, which interconnects up to 512 GPUs at 100 GB/s per link, and is 14 times faster than PCIe. But what about the “rest of the story”; how do you connect the GPU nodes? It takes a lot of switches.

Today's network topology interconnecting GPUs across the data center requires a PCIe Switches, NICS, ... [+] Rail Switches and Spine switches.

Enfabrica

Enfabrica came out of stealth at last year’s AI HW Summit, with backing by Nvidia and a who’s who of Venture Capitalists. This year, the company is closer to productization, and expanded their value proposition to include failover features so important to AI Training.

The Enfabrica Super-NIC fabric, enabled by the ACF-S silicon, eliminates the PCIe, NIC, and ... [+] Switches.

Enfabrica

When adoption begins in 2025, we expect Enfabrica to become a darling of the industry, and they should see significant adoption.

Conclusions

Whew! That was a lot of slides and four full days of companies striving for AI efficiency. And the focus has expanded far above the chips that power AI. For example, Meta showed failure data and a three-pronged strategy to deal with the certainty of failures: Avoid failures, detect failures, and tolerate the inevitability.

The Causes of failures in Meta's massive data centers.

Meta

If you only have time to attend two conferences next year, Nvidia GTC and the AI Infra Summit (new name) are the two you should attend.

Hope to see you there!

Follow me on Twitter or LinkedIn. Check out my website.

Disclosures: This article expresses the author's opinions and

should not be taken as advice to purchase from or invest in the companies mentioned. Cambrian-AI Research is fortunate to have many, if not most, semiconductor firms as our clients, including Blaize, BrainChip, Cadence Design, Cerebras, D-Matrix, Eliyan, Esperanto, GML, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm Technologies, Si-Five, SiMa.ai, Synopsys, Ventana Microsystems, Tenstorrent and scores of investment clients. We have no investment positions in any of the companies mentioned in this article and do not plan to initiate any in the near future. For more information, please visit our website at https://cambrian-AI.com.

Karl Freund

Following

One Community. Many Voices. Create a free account to share your thoughts.

Read our community guidelines .

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's Terms of Service. We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

False or intentionally out-of-context or misleading information
Spam
Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
Attacks on the identity of other commenters or the article's author
Content that otherwise violates our site's terms.

User accounts will be blocked if we notice or believe that users are engaged in:

Continuous attempts to re-post comments that have been previously moderated/rejected
Racist, sexist, homophobic or other discriminatory comments
Attempts or tactics that put the site security at risk
Actions that otherwise violate our site's terms.

So, how can you be a power user?

Stay on topic and share your insights
Feel free to be clear and thoughtful to get your point across
‘Like’ or ‘Dislike’ to show your point of view.
Protect your community.
Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's Terms of Service.

The 2024 AI HW Summit: Here’s What Caught My Attention

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Best High-Yield Savings Accounts Of 2024

Best 5% Interest Savings Accounts of 2024

Optical is the Next Big Thing

What Ever Happened to Analog Computing?

The Mega-NIC From Enfabrica Is Coming Soon

Other Stories Worth Telling

Positron:

Furiosa AI:

Broadcom and the Ultra Ethernet Consortium

Conclusions

Join The Conversation

Forbes Community Guidelines

More From Forbes

The 2024 AI HW Summit: Here’s What Caught My Attention

A Food Fight Erupted Over the Claim, “Fastest Inference on the Planet”

Best High-Yield Savings Accounts Of 2024

Best 5% Interest Savings Accounts of 2024

Optical is the Next Big Thing

What Ever Happened to Analog Computing?

The Mega-NIC From Enfabrica Is Coming Soon

Other Stories Worth Telling

Positron:

Furiosa AI:

Broadcom and the Ultra Ethernet Consortium

Conclusions

Join The Conversation

Forbes Community Guidelines