Network-on-Chip The Next Generation of System-on-C
Network-on-Chip The Next Generation of System-on-C
Network-on-Chip The Next Generation of System-on-C
The
Next Generation
of System-on-Chip
Integration
Santanu Kundu
Santanu Chattopadhyay
Network-on-Chip
The Next Generation
of System-on-Chip
Integration
Network-on-Chip
The Next Generation
of System-on-Chip
Integration
Santanu Kundu
Santanu Chattopadhyay
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface.................................................................................................................... xiii
Authors................................................................................................................. xvii
1. Introduction......................................................................................................1
1.1 System-on-Chip Integration and Its Challenges............................... 1
1.2 SoC to Network-on-Chip: A Paradigm Shift..................................... 3
1.3 Research Issues in NoC Development................................................5
1.4 Existing NoC Examples........................................................................ 8
1.5 Summary............................................................................................... 10
References........................................................................................................ 10
v
vi Contents
3.6.2.1 VC Allocator............................................................. 66
3.6.2.2 Switch Allocator...................................................... 69
3.7 Adaptive Router Architecture Design.............................................. 70
3.8 Summary............................................................................................... 73
References........................................................................................................ 73
xiii
xiv Preface
Santanu Kundu
LSI India Research and Development Pvt. Ltd.
(An Avago Technologies Company)
Santanu Chattopadhyay
Indian Institute of Technology, Kharagpur
Authors
xvii
1
Introduction
1
2 Network-on-Chip
100
Gate delay
(fan-out of 4)
Local
(scaled)
Global with
10
repeaters
Relative delay
Global without
repeaters
0.1
250 180 130 90 65 45 32
Process technology node (nm)
FIGURE 1.1
Projected relative delay for local and global wires and for logic gates at different technologies.
(Data from ITRS, International technology roadmap for semiconductors, Technical report,
International Technology Roadmap for Semiconductors, 2001.)
Introduction 3
CPU NI
NI DSP
DMA NI
Switch Switch
Switch
Switch
NoC Switch
Switch
NI MPEG
DRAM NI Accel NI
FIGURE 1.2
The NoC paradigm. (Data from Angiolini, F., NoC Architectures, n.d., http://www-micrel.deis
.unibo.it/MPHS/slidecorso0607/nocsynth.pdf.)
communication links between them. Routers route the packets from the
source node to the destination node depending on the underlying network
topology and routing strategy. The length of the point-to-point links should
be small to reduce wire delay.
To mitigate the ever increasing design productivity gap and to meet the
time-to-market requirement, reuse of IP cores is widely used in SoC develop-
ment. Besides IP cores, the bus interface protocol can also be reused to inte-
grate the IPs. While reuse is one of the key challenges that IC design houses
try to address, reuse of IPs, NI, and communication infrastructure such as
routers, underlying network, and flow control protocols can be adopted
in the NoC paradigm. Although selection of network topology and router
architecture is purely application specific, reusing these in different appli-
cations will not give the optimal solution. Hence, the reusability is limited
to a particular type of applications. For example, the network topology and
router architecture used for mobile application cannot be same as those of
video processing application. For similar applications, the design and verifi-
cation effort due to reuse will be drastically reduced.
NoC is a specific flavor of interconnection networks and involves several
abstraction layers such as physical, data link, network, and transport layers
(Jantsch and Tenhunen 2003), which are described as follows:
• The physical layer determines the number and length of wires con-
necting resources and switches.
• The data link layer defines the protocol of communication between
a resource and a switch, and between the two switches. Both the
Introduction 5
physical and data link layers are dependent on the technology. Thus,
for each new technology, these layers are defined.
• The network layer defines how a packet is transmitted over the net-
work from an arbitrary sender to an arbitrary receiver directed by the
receiver’s network address. This layer is also technology dependent.
• The transport layer is technology independent. In this layer, message
size can be variable. This layer breaks the message into network
layer packets.
Interconnection networks have been studied for more than the past two
decades and a solid foundation of design techniques has been described in
several text books (Duato et al. 2003; Dally and Towles 2004). With increasing
communication demand, the introduction of interconnection network in SoC
design has paved the route to NoC research almost a decade ago. Mullins
(2009) has listed more than 400 related articles addressing all these aspects.
NoC is today becoming an emerging research topic including hardware com-
munication infrastructure, software and operating system services, CAD
tools for NoC synthesis, and so on.
and routers are allowed to operate at their own clocks, giving rise to a GALS
scheme. Another important hardware aspect in designing a complete NoC
is the integration of cores with the routers. This needs the design of NI mod-
ules between the two.
The second dimension of research deals with the communication paradigm
on a given NoC platform. Once the infrastructure has been finalized, the next
important task is to design the communication methodology between the
cores via the established network. Routing policies, switching techniques,
congestion control, power and thermal management, and fault tolerance
and reliability issues are the main focus of this set. It, first of all, necessi-
tates the fixing of routing strategy. This is one of the very rich areas of
research in NoC design. It has profound effect on the performance of the
NoC as this chiefly determines the number of hops to be traversed in each
communi cation, congestion, traffic load distribution in different routers,
and so on. The domain is often complicated by the requirement to s upport
the quality-of-service (QoS). Arbitration of network resources in terms of
FIFOs and channels between the contending simultaneous communica-
tions is essential to ensure freedom from problems such as livelock and
deadlock. Like off-chip communications, on-chip communications also suf-
fer from capacitive crosstalk and electromagnetic radiations, corrupting the
data being transmitted. This makes it essential to adopt some fault-tolerant
schemes in the communication. As all designs are now invariably power
aware, the same is the requirement for NoC as well. It is required to judge
very critically the voltages and frequencies at which individual cores and
routers are made to operate to satisfy the overall performance requirement
with a minimum power budget.
The third dimension of research is paying attention to the design of an
evaluation framework for NoC by applying stochastic and application-specific
traffic. As the MPSoCs contain a large number of cores connected in some
topology via routers and interconnection links, it is mandatory to have a
clear idea about their performance before any investment is made in manu-
facturing the systems. The potential faults and drawbacks, if any, must be
identified at the design phase to avoid huge loss after getting the silicon
chips. Though many theoretical studies exist that can predict the behavior of
such a system, they are mostly for congestion-free environment and under
the assumption that all cores are equally active in producing traffic load to
the network. Both of these assumptions are highly optimistic for any prac-
tical design of moderate size. This necessitates the design of high-quality
NoC simulators to produce a behavior similar to that of the actual NoC. The
simulator should model the network at the granularity of individual hard-
ware blocks and wires in terms of functionality, delay, power, and so on.
In the absence of the actual traffic pattern for applications, often synthetic
traffic is used. This synthetic traffic should mimic the behavior of the actual
core that it corresponds to. With confidence gained after determining the
throughput, latency, and bandwidth of the network through simulation,
Introduction 7
the designer can quickly proceed to accurate estimation of area and power
consumption of the network, as it can be a significant portion of the overall
SoC cost budget.
The fourth dimension of research is related to application mapping. Mapping
of cores with regular and irregular sizes onto an underlying NoC platform
to achieve the required performance for a specific application is the major
issue of this dimension. Performance and energy-aware task scheduling for
heterogeneous NoC is another important problem of this class of research.
Figure 1.3 summarizes the major dimensions of NoC research as discussed
above.
Another important aspect is NoC testing. In any system development
process, testing occupies a major part of its turnaround time. The problem
is further complicated by the fact that the test volume becomes huge for
a NoC. It is necessary to apply test patterns to all the cores and get their
responses. The test patterns are to be transported from the system inputs
to the core inputs and the responses are to be carried through the network
from the core outputs to the system outputs. This gives rise to test schedule
optimization problems. The NoC infrastructure itself needs to be tested. The
power consumption during test is also a major concern.
While attempting to realize an application, or a set of applications, in
NoC, it is imperative to use a NoC infrastructure most suitable for the
application(s). This gives rise to the issue of application-specific NoC syn-
thesis. Unlike general standard topologies (such as mesh), NoC synthesis
approaches an attempt to derive the topology, routing policy, and so on to
obtain the best possible performance of the NoC implementation. While the
architecture may be synthesized for a single application, for a set of applica-
tions it is quite common to evolve a reconfigurable architecture. Depending
upon the communication needs of various applications running at different
points in time, a reconfigurable architecture can adapt itself to make it suit-
able for the currently running application. The reconfiguration may be in the
form of link reconfiguration, router port reconfiguration, buffer reconfigura-
tion, and so on.
FIGURE 1.3
Major NoC research dimensions.
8 Network-on-Chip
supporting highly local traffic inside a node, Intel has introduced single-chip
cloud computer (SCC) having 48 cores (SCC 2010). Two cores are connected
with each router of a 6 × 4 2D mesh. The operating frequency of each core is
1 GHz, whereas the routers are targeted to work with 2 GHz in 45-nm technol-
ogy. The routers have been implemented with eight virtual channels and four-
cycle latency. The link width has been taken as 128 bits. ST Microelectronics
have implemented STNoC (Coppola et al. 2004), a spidergon topology-based
NoC that follows a credit-based flow control. Philips have developed a topol-
ogy-independent NoC, Æthereal (Rijpkema et al. 2003), for supporting guar-
anteed throughput (GT) and best effort (BE) services. The router has been
implemented by an input-buffering scheme with first-in first-out (FIFO)
depth of 8 bits and width of 32 bits. It uses a standard credit-based end-to-
end flow control. Both the routers and the NI operate at 500 MHz in 130-nm
technology at the layout level. Arteris is another custom NoC that operates at
750 MHz in 90-nm technology (Arteris 2005). It has a set of configuration and
modeling tools—NoC compiler, NoC verifier, and NoC explorer—for getting
optimized performance and power result for any application.
Kumar et al. (2007) implemented a 36-core shared memory chip multi-
processing (CMP) system in 65-nm technology targeting 3.6 GHz router with
single-cycle latency. The cores are connected in a 6 × 6 2D mesh having a flit size
of 128 bits. The router has 12 unreserved virtual channels and 1 reserved virtual
channel for each of three message classes. It has been implemented with single-
stage pipelining. Lee et al. (2004) implemented a hierarchical star-connected
on-chip network by using a 16:1 serialized link. The routers and cores operate
at 1.6 GHz and 100 MHz, respectively, in 180-nm technology. The authors have
also implemented a custom NoC, Slim-spider (Lee et al. 2006), ensuring low-
power consumption where each router operates at 1.6 GHz in 180-nm technol-
ogy taking a flit size of 8 bits. Adriahantenaina et al. (2003) implemented a fat
tree-based NoC, scalable, programmable, integrated network (SPIN ), in 130-nm
technology taking a flit size of 32 bits. The operating frequency of routers is
found to be 200 MHz at the layout level. Another fat tree-based NoC, extended
generalized fat-tree (XGFT) (Kariniemi et al. 2006), uses a flit size of 32 bits and
operates at 400 MHz. Xpipes (Bertozzi et al. 2005), a custom NoC, consists of
soft macros of switches, NIs, and links. It takes a flit width of 32 bits and sup-
ports error detection and retransmission. Kavaldjiev et al. (2006) modified the
traditional virtual channel router and the new router is working at 500 MHz
in 180-nm technology supporting the 2D mesh topology with 16-bit flit size.
Pande et al. (2005) reported that the area overhead of the routers is reason-
ably low compared to that of full SoC. Feero and Pande (2009) designed a 3D
NoC architecture based on 3D mesh, 3D butterfly fat-tree (BFT), and 3D fat tree
topologies having 64 IP cores of size 2.5 mm × 2.5 mm each. They used a flit size
of 32 bits and four virtual channels each of two flits deep. The frequency of each
router is found to be 1.66 GHz in 90-nm technology after synthesis.
Some asynchronous NoCs have also been reported in the literature.
MANGO (Bjerregaard and Sparsoe 2005), a clock-less NoC, uses the 2D mesh
10 Network-on-Chip
topology with a flit size of 32 bits. The NIs synchronize the clocked open core
protocol (OCP) interfaces to the clock-less network in a GALS fashion and the
overall network is running at 795 MHz in 130-nm technology at the regis-
ter transfer level (RTL) level. Silistix Inc. has introduced its industry leading
asynchronous NoC, CHAINworks (Rostislav et al. 2005), for the design and syn-
thesis of complex devices. FAUST (Lattard et al. 2008), another asynchronous
NoC implemented in 130-nm technology for telecom requirements, uses the
2D mesh technology with a flit size of 32 bits. In the work of Salminen et al.
(2008), a list of NoC proposals has been presented in a tabular form that effec-
tively characterizes many of the NoCs that are not covered here.
1.5 Summary
NoC is a very active research field with many practical applications in
industry as it is expected to be an efficient communication backbone of next-
generation many-core-based SoCs. This chapter focuses on the upcoming
technology trends and the needs of NoC in designing many-core-based
SoCs. It also briefly covers different horizons of research in the field of NoC
design. Finally, a set of NoCs that has been designed till date from the indus-
try and academia has also been covered.
The research dimensions of NoC noted in this chapter have been taken up
in subsequent chapters and discussed in detail.
References
Adriahantenaina, A., Charlery, H., Greiner, A., Mortiez, L., and Zeferino, C. A. 2003.
SPIN: A scalable, packet switched, on-chip micro-network. Proceedings of the IEEE
Conference on Design, Automation and Test in Europe, pp. 70–73, Munich, Germany.
Angiolini, F. n.d. NoC architectures. http://www-micrel.deis.unibo.it/MPHS/slide-
corso0607/nocsynth.pdf.
Arteris, 2005. A comparison of network-on-chip and buses. White Paper. http://www
.arteris.com/noc-whitepaper.pdf.
Benini, L. and Micheli, G. D. 2002. Network on chips: A new SoC paradigm. IEEE
Computer, vol. 35, no. 1, pp. 70–78.
Bertozzi, D., Jalabert, A., Murali, S., Tamhankar, R., Stergiou, S., Benini, L., and
Micheli, G. D. 2005. NoC synthesis flow for customized domain specific multi-
processor systems-on-chip. IEEE Transactions on Parallel and Distributed Systems,
vol. 16, no. 2, pp. 113–129.
Bjerregaard, T. and Mahadevan, S. 2006. A survey of research and practices of
network-on-chip. ACM Computing Surveys, vol. 38, no. 1, pp. 1–51.
Introduction 11