In This Issue: September 1998 Volume 1, Number 2
In This Issue: September 1998 Volume 1, Number 2
In This Issue We begin this issue with Part II of “What Is a VPN?” by Paul Ferguson
and Geoff Huston. In Part I they introduced a definition of the term “Vir-
From the Editor .......................1 tual Private Network” (VPN) and discussed the motivations behind the
adoption of such networks. They outlined a framework for describing
the various forms of VPNs, and examined numerous network-layer VPN
What Is a VPN?–Part II ...........2 structures, in particular, that of controlled route leakage and tunneling. In
Part II the authors conclude their examination of VPNs by describing vir-
Reliable Multicast Protocols tual private dial networks and network-layer encryption. They also
and Applications....................19 examine link-layer VPNs, switching and encryption techniques, and
issues concerning Quality of Service and non-IP VPNs.
Layer 2 and Layer 3 IP Multicast is an emerging set of technologies and standards that
Switch Evolution....................38 allow many-to-many transmissions such as conferencing, or one-to-
many transmissions such as live broadcasts of audio and video over the
Internet. Kenneth Miller describes multicast in general, and reliable
Book Review..........................44
multicast protocols and applications in particular. Although multicast
applications are primarily used in the research community today, this
Fragments ..............................47 situation is likely to change as the demand for Internet multimedia
applications increases and multicast technologies improve.
Interest in the first issue of IPJ has exceeded our expectations, and hard
copies are almost gone. However, you can still view and print the issue
in PDF format on our Web site at www.cisco.com/ipj. The current
edition is also available on the Web. If you want to receive our next
issue, please complete and return the enclosed card.
I
n Part I we introduced a working definition of the term “Virtual
Private Network” (VPN), and discussed the motivations behind the
adoption of such networks. We outlined a framework for describ-
ing the various forms of VPNs, and then examined numerous network-
layer VPN structures, in particular, that of controlled route leakage and
tunneling techniques. We begin Part II with examining other network-
layer VPN techniques, and then look at issues that are concerned with
non-IP VPNs and Quality-of-Service (QoS) considerations.
Types of VPNs
This section continues from Part I to look at the various types of VPNs
using a taxonomy derived from the layered network architecture
model. These types of VPNs segregate the VPN network at the net-
work layer.
Network-Layer VPNs
A network can be segmented at the network layer to create an end-to-
end VPN in numerous ways. In Part I we described a controlled route
leakage approach that attempts to perform the segregation only at the
edge of the network, using route advertisement control to ensure that
each connected network received a view of the network (only peer net-
works). We pick up the description at this point in this second part of
the article.
Tunneling
As outlined in Part I, the alternative to a model of segregation at the
edge is to attempt segregation throughout the network, maintaining the
integrity of the partitioning of the substrate network into VPN compo-
nents through the network on a hop-by-hop basis. Part I examined
numerous tunneling technologies that can achieve this functionality.
Tunneling is also useful in servicing VPN requirements for dial access,
and we will resume the description of tunnel-based VPNs at this point.
Figure 2:
PPP Tunnel PPP Access Protocol
PPTP Virtual
Termination Model Interface
of PPTP
Dial IP Access
Although L2TP and PPTP may sound extraordinarily similar, there are
subtle differences that deserve further examination. The applicability of
both protocols is very much dependent on what problem is being
addressed. It is also about control—who has it, and why it is needed. It
also depends heavily on how each protocol implementation is
deployed—in either the voluntary or the compulsory tunneling models.
Regrettably, the L2TP draft does not detail all possible implementa-
tions or deployment scenarios for the protocol. The basic deployment
scenario is quite brief when compared to the rest of the document, and
is arguably biased toward the compulsory tunneling model. Nonethe-
less, there are implementations of L2TP that follow the voluntary
tunneling model. To the best of our knowledge, there has never been
any intent to exclude this model of operation. In addition, at various
recent interoperability workshops, several different implementations of
a voluntary L2TP client have been modeled. Nothing in the L2F proto-
col would prohibit deploying it in a voluntary tunneling manner, but
to date it has not been widely implemented. Further, PPTP has also
been deployed using the compulsory model in a couple of specific ven-
dor implementations.
Network-Layer Encryption
Encryption technologies are extremely effective in providing the seg-
mentation and virtualization required for VPN connectivity, and they
can be deployed at almost any layer of the protocol stack. The evolv-
ing standard for network-layer encryption in the Internet is IP Security
(IPSec)[3, 4]. (IPSec is actually an architecture—a collection of proto-
cols, authentication, and encryption mechanisms. The IPSec security
architecture is described in detail in [3].)
Link-Layer VPNs
One of the most straightforward methods of constructing VPNs is to
use the transmission systems and networking platforms for the physi-
cal and link-layer connectivity, yet still be able to build discrete
networks at the network layer. A link-layer VPN is intended to be a
close (or preferably exact) functional analogy to a conventional pri-
vate data network.
Figure 3: One of the nice things about a public switched wide-area network that
Conceptualization of provides virtual circuits is that it can be extraordinarily flexible. Most
Discrete Layer 3
Networks on a
subscribers to Frame Relay services, for example, have subscribed to
Common Layer 2 the service for economic reasons—it is cheap, and the service provider
Infrastructure usually adds a Service-Level Agreement (SLA) that “guarantees” some
percentage of frame delivery in the Frame Relay network itself.
The remarkable thing about this service offering is that the customer is
generally completely unaware of whether the service provider can actu-
ally deliver the contracted service at all times and under all possible
conditions. The Layer 2 technology is not a synchronized clock block-
ing technology in which each new service flow is accepted or denied
based on the absolute ability to meet the associated resource demands.
Each additional service flow is accepted into the network and carried
on a best-effort basis. Admission functions provide the network with a
simple two-level discard mechanism that allows a graduated response
to instances of overload; however, when the point of saturated over-
load is reached within the network, all services will be affected.
This situation brings up several other important issues: The first con-
cerns the engineering practices of the Frame Relay service provider. If
the Frame Relay network is poorly engineered and is constantly con-
gested, then obviously the service quality delivered to the subscribers
will be affected. Frame Relay uses a notion of a per-virtual circuit Com-
mitted Information Rate (CIR), which is an ingress function associated
with Frame Relay that checks the ingress traffic rate against the CIR.
Secondly, there are serious scaling concerns regarding full mesh mod-
els of connectivity, where suboptimal network-layer routing may result
because of cut-through. And the reliance on address resolution servers
to support the ARP function within the dynamic circuit framework
brings this model to the point of excessive complexity.
“Peer” VPN models that allow the egress nodes to maintain separate
routing tables have also been introduced—one for each VPN—effec-
tively allowing separate forwarding decisions to be made within each
node for each distinctive VPN. Although this is an interesting model, it
introduces concerns about approaches in which each edge device runs a
separate routing process and maintains a separate Routing Information
Base (RIB, or routing table) process for each VPN community of inter-
est. It also should be noted that the “virtual router” concept requires
some form of packet labeling, either within the header or via some light-
weight encapsulation mechanism, in order for the switch to be able to
match the packet against the correct VPN routing table. If the label is
global, the issue of operational integrity is a relevant concern, whereas if
the label is local, the concept of label switching and maintenance of
edge-to-edge label switching contexts is also a requirement.
Among the scaling concerns are issues regarding the number of sup-
ported VPNs in relation to the computational requirements, and stability
of the routing system within each VPN (that is, instability in one VPN
affecting the performance of other VPNs served by the same device). The
aggregate scaling demands of this model are also significant. Given a
change in the underlying physical or link-layer topology, the consequent
B 5
1
4 B
7
1
A
6
B L O L
A
1 3 4
B 5 3 1
7 4 6
B
MPLS Table
A B
Service Provider Network A Network B Level 1 MPLS
MPLS Network paths for VPN A
Link-Layer Encryption
As mentioned previously, encryption technologies are extremely effec-
tive in providing the segmentation and virtualization required for VPN
connectivity, and can be deployed at almost any layer of the protocol
stack. Because there are no intrinsically accepted industry standards for
link-layer encryption, all link-layer encryption solutions are generally
vendor specific and require special encryption hardware.
Quality-of-Service Considerations
In addition to creating a segregated address environment to allow pri-
vate communications, the expectation that the VPN environment will
be in a position to support a set of service levels also exists. Such per-
VPN service levels may be specified either in terms of a defined service
level that the VPN can rely upon at all times, or in terms of a level of
differentiation that the VPN can draw upon the common platform
resource with some level of priority of resource allocation.
This area is evolving rapidly, and much of it remains within the realm
of speculation rather than a more concrete discussion about the rela-
tive merits of various Internet QoS mechanisms. Efforts within the
Integrated Services Working Group of the IETF have resulted in a set
of specifications for the support of guaranteed and controlled load end-
to-end traffic profiles using a mechanism that loads per-flow state into
the switching elements of the network[12, 13]. There are numerous cave-
ats regarding the use of these mechanisms, in particular relating to the
ability to support the number of flows that will be encountered on the
public Internet[14]. Such caveats tend to suggest that these mechanisms
will not be the ones that are ultimately adopted to support service lev-
els for VPNs in very large networking environments.
If the scale of the public Internet environment does not readily support
the imposition of per-flow state to support guarantees of service levels
for VPN traffic flows, the alternative query is whether this environ-
ment could support a more relaxed specification of a differentiated
service level for overlay VPN traffic. Here, the story appears to offer
more potential, given that differentiated service support does not neces-
sarily imply the requirement for per-flow state, so stateless service
differentiation mechanisms can be deployed that offer greater levels of
support for scaling the differentiated service[15]. However, the precise
nature of these differentiated service mechanisms, and their capability
to be translated to specific service levels to support overlay VPN traffic
flows, still remain in the area of future activity and research.
Conclusions
So what is a virtual private network? As we have discussed, a VPN can
take several forms. A VPN can be between two end systems, or it can
be between two or more networks. A VPN can be built using tunnels
or encryption (at essentially any layer of the protocol stack), or both,
or alternatively constructed using MPLS or one of the “virtual router”
methods. A VPN can consist of networks connected to a service pro-
vider’s network by leased lines, Frame Relay, or ATM, or a VPN can
consist of dialup subscribers connecting to centralized services or other
dialup subscribers.
The pertinent conclusion here is that although a VPN can take many
forms, a VPN is built to solve some basic common problems, which
can be listed as virtualization of services and segregation of communi-
cations to a closed community of interest, while simultaneously
exploiting the financial opportunity of economies of scale of the under-
lying common host communications system.
Acknowledgments
Thanks to Yakov Rekhter, Eric Rosen, and W. Mark Townsley, all of
Cisco Systems, for their input and constructive criticism.
References
[1] Valencia, A., M. Littlewood, and T. Kolar. “Layer Two Forwarding
(Protocol) ‘L2F’.” draft-valencia-l2f-00.txt, work in progress,
October 1997.
[3] Kent, S., and R. Atkinson. “Security Architecture for the Internet
Protocol.” draft-ietf-ipsec-arch-sec-04.txt, work in
progress, March 1998.
[4] Additional information on IPSec can be found on the IETF IPSec home
page, located at http://www.ietf.org/html.charters/ipsec-
charter.html
[6] The ATM Forum. “Multi-Protocol Over ATM Specification v1.0.” af-
mpoa-0087.000, July 1997.
GEOFF HUSTON holds a B.Sc and a M.Sc from the Australian National University.
He has been closely involved with the development of the Internet for the past
decade, particularly within Australia, where he was responsible for the the initial
build of the Internet within the Australian academic and research sector. Huston is
currently the Chief Technologist in the Internet area for Telstra. He is also an active
member of the IETF, and was an inaugural member of the Internet Society Board of
Trustees. He is coauthor of Quality of Service: Delivering QoS on the Internet and in
Corporate Networks, published by John Wiley & Sons, ISBN 0-471-24358-2, a
collaboration with Paul Ferguson. E-mail: [email protected]
M
ulticast IP network services offer new opportunities to
provide value-added applications that involve many-to-
many transmission such as conferencing or network
gaming, or one-to-many transmission such as multimedia events,
tickertape feeds, and file transfer, where the many could be thousands
or even conceivably millions. Multicast IP services use a different kind
of IP address, called Class D. In contrast to individual host addresses
(Classes A–C), which include a host and a network component and
usually are semipermanent, Class D multicast addresses may by design
be used only for a particular session, or can be semipermanent, as
multicast groups may be set up and torn down relatively quickly, on
the order of seconds. The IP address structure is shown in Figure 1.
Figure 1: 0 1 2 3 4 8 16 24 31
IP Address Types
Class A 0 netid hostid
Hosts join groups at the receiver’s initiation using the Internet Group
Management Protocol (IGMP). When a host joins a group, it notifies
the nearest multicast subnet router of its presence in the group, as
shown in Figure 2. First defined in RFC 1112[1], IGMPv1 is still the
version of IGMP most widely supported. IGMPv2 has recently been
documented as an official RFC (RFC 2236[2]). The main feature that
IGMPv2 brings is reduced latency for leaving groups. In IGMPv1, the
designated multicast router for the subnet polls for multicast group
members; no response between polls indicates that all hosts in a
particular multicast group have left the group, and that the routers
can prune back the multicast routing tree.
Figure 2:
IGMP Query
IGMPv1 Dialog to 224.0.0.1
Router
Network
Host A
Router Host Membership
Report to Group Address
Host B
Host C
Figure 3:
Specialized Multicast Multicast
Transport Protocols Application
Operate over UDP Specialized Multicast
or IP Transport Application
Specialized
TCP UDP 4 TCP UDP
Transport
IP 3 IP
Figure 4:
Reliable Multicast
Application Application Type Latency Req. Reliability Scalability
Categories
Collaborative Low Semi/Strict <100
Standardization Effort
The standardization effort has been started in an IRTF research group
to study the problems and possible solutions by Internet researchers.
This effort was first placed in the hands of researchers because the
problems were considered very difficult to solve in the global Internet.
Some of the concerns about reliable multicast were discussed in an
expired Internet Draft published in November 1996 by the Transport
Area Directors of IETF.
These concerns formed the basis for the work of the RMRG, which
was formed in early 1997. The concerns from that document follow:
“A particular concern for the IETF (and a dominant concern for the
Transport Services Area) is the impact of reliable multicast traffic on
other traffic in the Internet in times of congestion (more specifially,
the effect of reliable multicast traffic on competing TCP traffic). The
success of the Internet relies on the fact that best-effort traffic
responds to congestion on a link (as currently indicated by packet
drops) by reducing the load presented on that link. Congestion
collapse in today’s Internet is prevented only by the congestion
control mechanism in TCP.
Scaling Issues and How Current Reliable Multicast Protocols Solve Them
Two primary issues are related to scaling, that is, the ability to handle
large groups. The first and most significant is widely known as
acknowledgment/negative acknowledgment (ACK/NAK) implosion.
As the number of receivers grows, the amount of back traffic to the
sender eventually overwhelms its capacity to handle them.
Additionally, the network at the sender site becomes congested from
the cumulative back traffic from the receivers.
Receivers with missing data wait a random time period before issuing
repair requests, allowing suppression of duplicate requests similar to
the mechanism that IGMP uses on its subnet. A similar process occurs
for making the actual repairs. The random backoff time for both
repair requests made by receivers and repairs made by senders is a
Any receiver may satisfy the repair request, because all receivers are
required to cache previously sent data. Any receiver that can satisfy
the request is prepared to do so; a random backoff timer is used
before a repair is sent, and if it sees the repair being sent by another
group member, it stays silent to reduce the probability of sending
duplicate repairs.
One design goal of the creators of PGM was simplicity and the ability
to optimally leverage routers in the network to provide scalability.
PGM is an example of a protocol that bypasses UDP and interfaces
directly to IP via “raw” sockets, as shown in Figure 6.
Figure 6:
PGM Interfaces
Directly to IP
Applications
IP
Data Link
Physical
Figure 7:
PGM NAK/NCF PGM
Dialog Receiver
NCF
NCF
NAK
NCF, ODATA, RDATA
NCF
NAK
PGM
NCF NAK Receiver
NAK
PGM NAK
Sender
NCF
NAK
PGM
Receiver
Router Subnet
The unicast path back to the source must be the same path as the
downstream multicast tree. SPMs are sent downstream interleaved
with ODATA packets to establish a source path state for a given
source and session. PGM-aware routers use this information to
determine the unicast path back to the source for forwarding NAKs.
SPMs also alert receivers that the oldest data in the transmit window
is about to be retired from the window and will thus no longer be
available for repairs from the source. SPMs are sent by a source at a
rate that is at least the rate at which the transmit window is advanced.
This rate provokes “last call” NAKs from receivers and updates the
receive window state at receivers.
PGM-aware routers also keep state on where the NAKs come from
in the distribution tree so that they may constrain the forwarding of
RDATA repairs to only those ports from which NAKs requesting
that repair were received. This scenario eliminates the transmission
of repair data to parts of the distribution tree where the repair is
not needed.
The basic MFTP protocol breaks the data entity to be sent into
maximum size “blocks,” where a block by default consists of
thousands or tens of thousands of packets, depending on packet size
used. This setup is shown in Figure 8.
Figure 8:
File
MFTP Blocks
NAKs are normally sent unicast back to the source, unless aggrega-
tion to improve scaling using enabled network routers is used. In this
case, the NAKs are sent multicast to a special administrative traffic
group address.
MFTP does not repair after each block, however; it takes advantage of
the non-real time nature of the application for benefit. The data entity,
such as a file, is sent initially in its entirety in a first pass. The sender
collects the NAK packets for a block from all the receivers. One NAK
packet from a receiver can represent thousands or even tens of
thousands of bad packets, reducing NAK implosion by orders of
magnitudes. The collection of NAKs received by the sender from all
the receivers is logically OR-ed together to represent the collective
need for repairs for the receiving group. These repairs are sent by the
sender in a second pass to the group. If certain receivers already have
the repair, it is simply ignored. This scenario is repeated, if necessary,
until all repairs are received by all receivers or until a configurable
timeout occurs.
Thus, packet ordering services are not provided, and holes in the
data caused by dropped packets or packets in error are filled in as
they are received.
The sender is rate based; in other words, it transmits at a data rate set
by the operator to be less than or equal to what the network can
handle. The protocol is thus very efficient with high-latency networks
such as satellites, and it is impervious to network asymmetry. It also
attempts to be as scalable as possible on one-hop networks such as
satellite networks, and it provides for extensions so that network
elements may aggregate downstream responses to increase scalability
further, depending on the network configuration.
Figure 9:
Routers as Network
Aggregators
SRM has been used by the research community only over the Mbone,
and it is still being refined. Another problem with SRM is that in its cur-
rent incarnation, it supports neither asymmetric nor satellite networks.
Some early Internet Service Provider (ISP) multicast implementations,
offer multicast support in only one direction; SRM requires total multi-
cast support.
PGM is new and offers promise, but there is no deployment yet, and it
likely will not occur until early 1999. PGM also requires router sup-
port in a terrestrial land-line network to gain scaling.
MFTP has the limitation that it supports only bulk transfer applica-
tions. However, one trade-off is that it can support all network
infrastructures, including satellite infrastructures with scaling. MFTP
has also been available commercially in products with the longest
application support, dating back to 1995. Thus, MFTP-based prod-
ucts have the largest installed base of any reliable multicast-based
product being used over WANs. The largest commercial installation
of over 8,500 remote sites in the group is the General Motors[9] dealer
network. Several other commercial installations of MFTP-based appli-
cations number over 1,000 group members.
Advanced Research Topics Discussed in Reliable Multicast Research Group
A promising technique to reduce the amount of repair data that needs
to be retransmitted is called erasure correction. This technique can
significantly reduce the amount of repairs that need to be resent if the
packet loss is largely uncorrelated at the receivers. It uses a forward
error correction (FEC) code to generate parity packets to be used for
repairs only. This setup provides benefit if errors at receivers are
uncorrelated. For example, suppose 16 receivers each have one miss-
ing packet, but they are all different. Rather than send all 16 original
data packets, one FEC packet could be sent that could correct the one
missing packet at all 16 receivers, requiring retransmission of only one
packet rather than 16.
If the loss is correlated, then many of the receivers lose the same data,
and erasure correction is of no benefit. However, there is also no pen-
alty, except for the need for computing power at both the sender and
the receivers to perform the FEC correction calculations. Simulations
have show[10] that there is a greater than 2:1 reduction in the number
of repairs needed to be sent with our example of 10,000 receivers.
This benefit will be even larger when group sizes become larger than
tens of thousands.
There are some further issues that have been pointed out by the
researchers with the Other issues with the layering approaches have
been pointed out by the researchers, however. For layering to be
effective, the routing tree should be identical for the different groups;
otherwise congestion will not be relieved on a part of the tree. This
may not always be the case, especially in sparse mode routing
protocols, where selection of the rendezvous point or core is based on
group address.
Even if the same distribution tree is used for the different layers, it has
been pointed out[12] that leaves of hosts downstream from a congested
link should be coordinated; otherwise the action of less than all of
them has no effect on congestion. Additionally, a receiver could cause
congestion by adding a layer that another receiver could interpret as
congestion, causing it to drop a layer with no effect.
This approach, however, has two basic problems. The first is that there
is delay, because the sender needs to get feedback from the multitude
of receivers before it acts. This delay can be considerably longer than
in the case of TCP, which needs feedback from only one receiver.
The second flaw is that one errant receiver can effectively penalize the
whole group, because the sender reduces the rate to the total group.
Then the repairs need to be contained within only the region of the
network that lost the original transmission, that is, the “subcast” region.
Figure 10:
Optimized Local Source Source
Repair
”Turning
Point“
”Subcast“
Retransmitter Repair
Congested
Repair
Link
Request
Subcast
Region
One proposal is to ask for assistance from the network routers. They
know the topology and could be used to find the closest willing
retransmitter that has the repair. The router could also direct the
repair to only the affected region: the subcast.
There are two basic ways to accomplish this scenario for one-to-
many sessions. The first and most common is the “broadcast TV”
model. The Multiparty Multimedia Session Control (MMUSIC)
working group of the IETF has developed some protocols that can be
used to advertise content. The Session Announcement Protocol
(SAP)[17] provides the mechanism to send a stream on a “well-
known” multicast address to announce content to any potential lis-
teners who may be interested. It uses the Session Description
Protocol (SDP)[18] to describe the contents that are announced. These
two protocols together have been used to create a session directory
tool that is available on the Mbone. This setup creates essentially the
equivalent of a “preview channel” such as is often available on cable
television systems.
SDP is also used to post content on Web sites, which advertise that
content to anyone who wishes to receive it.
The MMUSIC group has also created the Session Invitation Protocol
(SIP)[19], which is used to invite members to a conference of some sort,
including possibly a data conference. This protocol is appropriate for
use with whiteboard applications, for example.
References
[1] Deering, S. “Host Extensions for IP Multicasting.” RFC 1112, August
1989.
[3] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, V. “RTP: A
Transport Protocol for Real-Time Applications.” RFC 1889, January
1996.
[5] Paul, S., Sabnani, K. K., Lin, J. C., and Bhattacharyya, S. “Reliable
Multicast Transport Protocol (RMTP).” IEEE Journal on Selected Areas
in Communications, April 1997.
[6] Floyd, S., Jacobson, V., Liu, C., McCanne, S., and Zhang, L. “A Reliable
Multicast Framework for Light-weight Sessions and Application Level
Framing.” ACM Transactions on Networking, November 1996.
[7] Farinacci, D., Lin, A., Speakman, T., and Tweedly, A. “PGM Reliable
Transport Protocol Specification.” Work in progress, Internet Draft,
draft-speakman-pgm-spec-01.txt, January 29, 1998.
[8] Miller, K., Robertson, K., Tweedly, A., and White, M. “StarBurst
Multicast File Transfer Protocol (MFTP) Specification.” Work in progress,
Internet Draft, draft-miller-mftp-spec-03.txt, April 1998.
[10] Kasera, S. K., Kurose, J., Towsley, D., “Scalable Reliable Multicast
Using Multiple Multicast Groups.” CMPSCI Technical Report TR 96-
73, October 1996.
[13] Sano, T., Yamanouchi, N., et al. “Flow and Congestion Control for Bulk
Reliable Multicast Protocols—toward coexistence with TCP.” Submitted
to INFOCOM ’98, presented at RMRG meeting in Cannes, France,
September 1997.
[15] Papadopoulos, C., Parulkar, G., and Varghese, G. “An Error Control
Scheme for Large-Scale Multicast Applications.” Submitted to
INFOCOM ’98, presented at RMRG meeting in Cannes, France,
September 1997.
[19] Handley, M., Schulzrinne, H., and Schooler, E. “SIP: Session Invitation
Protocol.” Work in progress, Internet Draft, draft-ietf-mmusic-
sip-04.txt, November 1997.
L
ayer 2 switches are frequently installed in the enterprise for
high-speed connectivity between end stations at the data link
layer. Layer 3 switches are a relatively new phenomenon, made
popular by (among others) the trade press. This article details some of
the issues in the evolution of Layer 2 and Layer 3 switches. We
hypothesize that that the technology is evolutionary and has its origins
in earlier products.
Layer 2 Switches
Bridging technology has been around since the 1980s (and maybe
even earlier). Bridging involves segmentation of local-area networks
(LANs) at the Layer 2 level. A multiport bridge typically learns
about the Media Access Control (MAC) addresses on each of its
ports and transparently passes MAC frames destined to those ports.
These bridges also ensure that frames destined for MAC addresses
that lie on the same port as the originating station are not forwarded
to the other ports. For the sake of this discussion, we consider only
Ethernet LANs.
Figure 1:
Layer 2 switch with External Router
for Inter-VLAN traffic and connecting
to the Internet Layer 2 Router
Switch
1 4
Station A 2 3 Station D
Internet
Station B Station C
Virtual LANs
In reality, however, LANs are rarely so clean. Assume a situation
where A,B,C, and D are all IP stations. A and B belong to the same IP
subnet, while C and D belong to a different subnet. Layer 2 switching
is fine, as long as only A and B or C and D communicate. If A and C,
which are on two different IP subnets, need to communicate, Layer 2
switching is inadequate—the communication requires an IP router. A
corollary of this is that A and B and C and D belong to different
broadcast domains—that is, A and B should not “see” the MAC layer
broadcasts from C and D, and vice versa. However, a Layer 2 switch
cannot distinguish between these broadcasts—bridging technology
involves forwarding broadcasts to all other ports, and it cannot tell
when a broadcast is restricted to the same IP subnet.
Characteristics
Layer 2 switches themselves act as IP end nodes for Simple Network
Management Protocol (SNMP) management, Telnet, and Web based
management. Such management functionality involves the presence of
an IP stack on the router along with User Datagram Protocol (UDP),
Transmission Control Protocol (TCP), Telnet, and SNMP functions.
The switches themselves have a MAC address so that they can be
addressed as a Layer 2 end node while also providing transparent
switch functions. Layer 2 switching does not, in general, involve
changing the MAC frame. However, there are situations when
switches change the MAC frame. The IEEE 802.1Q Committee is
working on a VLAN standard that involves “tagging” a MAC frame
with the VLAN it belongs to; this tagging process involves changing
the MAC frame. Bridging technology also involves the Spanning-Tree
Protocol. This is required in a multibridge network to avoid loops.
The same principles also apply towards Layer 2 switches, and most
commercial Layer 2 switches support the Spanning-Tree Protocol.
Layer 3 Switches
Layer 3 switching is a relatively new term, which has been “extended”
by a numerous vendors to describe their products. For example, one
school uses this term to describe fast IP routing via hardware, while
another school uses it to describe Multi Protocol Over ATM (MPOA).
For the purpose of this discussion, Layer 3 switches are superfast rout-
ers that do Layer 3 forwarding in hardware. In this article, we will
mainly discuss Layer 3 switching in the context of fast IP routing,
with a brief discussion of the other areas of application.
Evolution
Consider the Layer 2 switching context shown in Figure 1. Layer 2
switches operate well when there is very little traffic between VLANs.
Such VLAN traffic would entail a router—either “hanging off” one of
the ports as a one-armed router or present internally within the
switch. To augment Layer 2 functionality, we need a router—which
Figure 2:
Combined Layer2/ Combined Layer 2/3
Layer3 Switch Switch
connecting directly 1 4
to the Internet
Station A 2 3 Station D
Internet
Station B Station C
Now how does the switch know that C’s IP destination address is Port
3? When it performs learning at Layer 2, it only knows C’s MAC
address. There are multiple ways to solve this problem. The switch
can perform an Address Resolution Protocol (ARP) lookup on all the
IP subnet 2 ports for C’s MAC address and determine C’s IP-to-MAC
mapping and the port on which C lies. The other method is for the
switch to determine C’s IP-to-MAC mapping by snooping into the IP
header on reception of a MAC frame.
Characteristics
Configuration of the Layer 3 switches is an important issue. When the
Layer 3 switches also perform Layer 2 switching, they learn the MAC
addresses on the ports—the only configuration required is the VLAN
configuration. For Layer 3 switching, the switches can be configured
with the ports corresponding to each of the subnets or they can
perform IP address learning. This process involves snooping into the
IP header of the MAC frames and determining the subnet on that port
from the source IP address. When the Layer 3 switch acts like a one-
armed router for a Layer 2 switch, the same port may consist of
multiple IP subnets.
References
[1] Computer Networks, 3rd Edition, Andrew S. Tanenbaum, ISBN 0-13-
349945-6, Prentice-Hall, 1996.
[4] “Draft Standard for Virtual Bridged Local Area Networks,” IEEE
P802.1Q/D6, May 1997.
[6] “Requirements for IP Version 4 Routers,” Fred Baker, RFC 1812, June
1995.
Gigabit Ethernet is storming its way onto the high-speed LAN scene.
From a concept in 1984 to an emerging commercial reality in 1998,
Gigabit Ethernet promises to give other high-speed LAN technologies,
especially ATM, a serious run for their money. Capitalizing on the
basic ease of use and deployment that has made other forms of Ether-
net the most popular LAN technology of all, Gigabit Ethernet promises
to add major bandwidth to such networks in a straightforward, com-
pletely compatible, and relatively affordable way. This book performs
an excellent survey of the technologies, algorithms, and design princi-
ples that make Gigabit Ethernet possible, and also explains where the
tremendous appeal of Gigabit Ethernet really lies. Much of the book is
devoted to explaining Ethernet principles and operation in general, as
well as exploring recent developments that have enabled gigabit tech-
nologies to emerge.
Organization
The book is divided into three parts. Part I explores the foundations
that underpin Gigabit Ethernet, starting with a brief but cogent explo-
ration of Ethernet before gigabit versions loomed on the horizon. The
rest of Part I covers the trends in LAN usage in general, and Ethernet in
particular, that laid the groundwork for Gigabit Ethernet. These trends
include the move from shared media to dedicated media on many
LANs, and likewise from shared LANs to dedicated LANs, and the
concomitant deployment of full-duplex technologies to support bidirec-
tional, high-bandwidth communications. Seifert, an original member of
the DIX (Digital-Intel-Xerox) team that developed Ethernet, writes
clearly and compellingly about complex issues, such as flow control,
medium independence, and automatic configuration, as he explains
what made Gigabit Ethernet possible, if not inevitable.
In Part II, Seifert turns his focus onto Gigabit Ethernet itself, beginning
with an overview. In the rest of Part II, he explains how Media Access
Control (MAC) works for half-duplex and full-duplex versions of
Gigabit Ethernet, and makes a strong case for the essential irrelevancy
of shared-media and half-duplex operation for Gigabit Ethernet. Along
the way, Seifert also covers how Gigabit Ethernet networking devices,
such as repeaters and switching and routing hubs, must be designed
and how they work, and covers the behavior and operation of the
physical layer at gigabit speeds.
In Part III, Seifert tackles some of the most interesting material in this
book. He begins with a discussion of how LANs and computers change
roles over time in acting as the bottleneck for network use. The point
here is that because of its extremely high bandwidth relative to the
demands of most applications and end-user requirements, Gigabit
Ethernet is likely to remain a backbone or clustering technology for the
foreseeable future. He also explores the performance considerations for
both networks and applications involved when extreme speeds or
excessive bandwidths are available, to point out how bandwidth aggre-
gation is presently Gigabit’s most immediate and compelling
contribution to networking.
An Outstanding Contribution
A rundown of Seifert’s layout and content, however, fails to do com-
plete justice to this book. For one thing, Seifert’s work includes the
funniest and most ingenious footnotes I’ve seen in recent publications,
including some truly horrendous puns and some downright howlers.
For example, when discussing how repeaters work, he comments that
“A jabbering station causes carrier sense to be continuously asserted
and blocks all use of a shared LAN. A repeater looks for this condition
and isolates the offending station.” To this last sentence, he appends
the following footnote: “Research is underway to determine if this
mechanism can be extended for use on politicians and university lectur-
ers.” And this is just one of dozens of such gems that help to relieve the
dryness that deeply technical material can sometimes manifest.
This book is also masterful simply because the author understands his
material so well, and does such an outstanding job of explaining and
exploring even the most abstruse networking concepts. Although I’ve
been working with Ethernet for 15 years, I learned a great deal of new
material from Part I of the book because old concepts were explained
in new ways that improved my understanding. I suspect other readers
will have one or two “Aha!” experiences from this tome as well.
But it’s when making the case for full-duplex Gigabit Ethernet and
exploring the requirements for switching and routing behaviors in
Gigabit Ethernet networking devices that this material really shines.
Without a doubt, this book is among the very best of any of the litera-
ture available on high-speed networking today. I give it an A+ rating,
not only because of the breadth and depth of its technical coverage and
its compilation of essential concepts and information, but also because
the author’s deep understanding of networking protocols and commu-
nications needs enlivens all of his discussions of matters technical,
business, and political. If you want to understand Gigabit Ethernet,
this book is the obvious place to begin (and for many, to end) your
search for enlightenment.
But even if all you want is a good read about expensive, exotic, and
high-performance technology, Seifert’s book offers the opportunity
for outright enjoyment of the prose, and shared delight at untangling
the technical dilemmas that any good design engineer must unravel
on the road between a set of requirements and working implementa-
tion thereof.
—Ed Tittel
LANWrights, Inc.
[email protected]
So, make sure you receive the next issue of The Internet Protocol Jour-
nal due out in December 1998.
The IANA has posted draft bylaws for its incorporation on the IANA
web site at: http://www.iana.org, and asked for community input.
By the time you read this, the incorporation should already have taken
place. We will provide an update in our next issue.
This publication is distributed on an “as-is” basis, without warranty of any kind either
express or implied, including but not limited to the implied warranties of merchantability,
fitness for a particular purpose, or noninfringement. This publication could contain
technical inaccuracies or typographical errors. Later issues may modify or update
information provided in this issue. Neither the publisher nor any contributor shall have any
liability to any person for any loss or damage caused directly or indirectly by the
information contained herein.
Edward R. Kozel, Sr. VP, Corporate Development Cisco, Cisco Systems, and the Cisco
Cisco Systems, Inc., USA Systems logo are registered
trademarks of Cisco Systems, Inc. in
Peter Löthberg, Network Architect
Stupi AB, Sweden the USA and certain other countries.
All other trademarks mentioned in this
Dr. Jun Murai, Professor, WIDE Project document are the property of their
Keio University, Japan respective owners.
Dr. Deepinder Sidhu, Professor, Computer Science &
Electrical Engineering, University of Maryland, Baltimore County Copyright © 1998 Cisco Systems Inc.
Director, Maryland Center for Telecommunications Research, USA All rights reserved. Printed in the USA.