Interview Habana Labs Targets AI Processors
Interview Habana Labs Targets AI Processors
Interview Habana Labs Targets AI Processors
https://www.convergedigest.com/2018/09/interview-habana-labs-targets-ai.html
• Home
• Women in Networking »
• Service Providers »
• Mobile »
• Financial »
• Packet Systems »
• Optical »
• Silicon »
• Geographic »
Habana Labs, a start-up based in Israel with offices in Silicon Valley, emerged from
stealth to unveil its first AI processor.
Habana's deep learning inference processor, named Goya, is >2 orders of magnitude
better in throughput & power than commonly deployed CPUs, according to the
company. The company will offer a PCIe 4.0 card that incorporates a single Goya
HL-1000 processor and designed to accelerate various AI inferencing workloads,
such as image recognition, neural machine translation, sentiment analysis,
recommender systems, etc. A PCIe card based on its Goya HL-1000 processor
delivers 15,000 images/second throughput on the ResNet-50 inference benchmark,
with 1.3 milliseconds latency, while consuming only 100 watts of power.
I recently sat down with Eitan Medina, Habana Labs' Chief Business Officer, to
discuss the development of this new class of AI processors and what it means for the
cloud business.
Jim Carroll: Who is Habana Labs and how did you guys get started?
Eitan Medina: Habana was founded in 2016 with the goal of building AI processors
for inference and training. Currently, we have about 120 people on board, mostly in
R&D and based in Israel. We have a business headquarters here in Silicon Valley. In
terms of the background of the management team, most of us have deep expertise in
processors, DSPs, and communications semiconductors. I previously was the CTO
for Galileo Technology (acquired by Marvell), and now I am on the business side. I
would say we have a very strong and multidisciplinary team for machine learning. We
certainly have the expertise in the processing, software and networking to architect a
complete hardware and software solution for deep learning.
In building this company, we identified the AI space as one that deserves its own
class of processors. We believe that the existing CPUs and GPUs are not good
enough.
The first wave of these AI processors are coming now or being announced now.
Habana decided that unlike other semiconductor companies, we would only emerge
from stealth once we have an actual product. We have production samples now and
that is why we are officially launching the company.
Jim Carroll: Who are the founders and what motivated them to enter this market
segment?
Eitan Medina: The two co-founders are David Dahan (CEO) and Ran Halutz (VP
R&D), who worked together at Prime Sense, a company that was acquired by Apple.
We also have onboard Shlomo Raikin (CTO), who was the Chief SoC Architect at
Mellanox and who has 45 patents. We've also been able to recruit top talent from
across the R&D ecosystem in Israel. The lead investors are Avigdor Willenz
(Chairman), Bessemer, and WALDEN (Lip-Bu Tan).
Jim Carroll: The market for AI processors, obviously, is in its infancy. How do
you see it developing?
Eitan Medina: Well, some analysts are already projecting a market for new class of
chipsets for deep learning. Tractica, for instance, divides the emerging market into
CPUs, GPUs, FPGAs, ASICs, SoC accelerators, and other devices. We see the need
for a different type of processor because of the huge gap between the computational
requirements for AI and the incremental improvements that vendors have delivered
over the past few years, which right are just small improvements to CPUs and GPUs.
Look at the best-in-class, deep learning models and then calculate how much
computing power is needed to train them. Look at how these requirements have
grown over the past few years. Trying graphing this progression and you will see a log
scale graph with a doubling time of three and a half months. That's 10x every year.
Initially, people were running machine learning on CPUs, and then they adopted
Nvidia's GPUs. What we see in the market today is that training is dominated by
GPUs, while influence is dominated by CPUs.
Eitan Medina: When we looked at the overall deep learning space, we began with the
workflows. It is important to understand that there's a training workflow, and there's an
inference workflow. What we are introducing today is our "Goya" inference processor.
Our "Gaudi" training processor will be introduced in the second quarter of 2019. It will
feature a 2Tbps interface per device and its training performance scales linearly to
thousands of processors. We intend to sell line cards equipped with these processors,
which you can then plug into your existing servers.
The inference processor offloads this workload completely from the CPU. Therefore,
you will not need to replace your existing servers with more advanced CPUs. What
can this do for you? This is where our story gets really interesting. We're about more
than an order of magnitude improvement.
Look at this graph showing our ResNet-50 inference throughput and latency
performance. On the left side is the best performance Intel has shown to date on a
dual socket Xeon Platinum. Latency is not reported, which could be a critical issue.
In the middle is Nvidia's V100 Tensor GPU, with shows 6ms of latency -- not bad, but
we can do better. Our performance, shown on the right, exceeds 15,000 images per
second with just 1.3ms of latency. Our card is just 100 watts, whereas we estimate at
least 400 watts for the other guys.
Jim Carroll: Where are you getting these gains? Are you processing the images
in a different way?
Eitan Medina: Well, I can say that we are not changing the topology. If you are an AI
researcher with a ResNet-50 topology, we will take you topology and ingest it to our
compiler. We're not forcing you to change anything in your model.
Jim Carroll: So, if we try to understand the magic inside a GPU, Nvidia will talk
about their ability to process polygons in parallel with large numbers of cores.
Where is the magic for Habana?
Eitan Medina: Yeah, Nvidia will say they are really good at figuring out polygons, and
may tell you about the massive memory bandwidth they can provide to the many
cores. But, at the end of the day, if you are interested in doing image recognition, you
only really care about application performance, not the stories of how wonderful the
technology is.
Let's assume for a second that there's a guy with a very inefficient image processing
architecture, ok? What would this guy do to give you better performance from
generation to generation? He would just pack more of the same stuff each time --
more more memory, more bandwidth, and more power. And then he would tell you to
"buy more to save more". Sound familiar? This guy can show you improvements, but
if he's carrying that inefficiency throughout the stack, it is just going to be more of the
same. If a new guy comes to market, what you want to see is application performance.
What's your latency? What's your throughput? What's your accuracy? What's your
power? What's your cost? If we can show all of that, then we don't have to have a
debate about architecture.
Jim Carroll: So, are you guys using the same "magic" to deliver inference
performance?
Eitan Medina: No, but for now, I want to show you what we can do. The lion share of
inference processors used by cloud operators today are CPUs -- an estimated 91% of
these workloads are running on CPUs. Nvidia so far has not come up with a solution
to move this market to GPUs. The market is using their GPUs mainly for training.
Our line card, installed in this server, can ingest and process 15,000 frames per
second through the PCI bus. Because our chip is so efficient, we don't need crazy
memory technologies or specialized manufacturing techniques. In fact, this chip is
built with 16 nanometer technology, which is quite mature and well-understood. As
soon as we got the first device back from TSMC, we had ResNet up and running
immediately.
In a cloud data center, three of our line cards could deliver the inference processing
equivalent of 169 Intel powered servers or eight of Nvidia's latest Tesla V100 GPUs.
Habana Labs is showcasing a Goya inference processor card in a live server, running
multiple neural-network topologies, at the AI Hardware Summit on September 18 – 19,
2018, in Mountain View, CA.
Create a Link
See also
Popular Posts
•
Tillman Infrastructure has built hundreds of new macro cell towers for lease to
AT&T and hundreds of additional tower builds are underwa...
Dr. Richard Uhlig has been named as the new managing director of Intel Labs.
Prior to this role, Rich was the director of Systems and Softwa...
Packet, a start-up developing a bare metal cloud for developers, will leverage
Netronome's SmartNICs to power cloud-native workloads at ...
•
Zayo will open a data center in Piscataway, New Jersey and has signed an
anchor agreement with a leading financial services tenant. The cont...