Can a DDoS break the Internet? Sure… just not all of it

We reported last week on a massive distributed denial of service attack that was intended to take anti-spam organization Spamhaus offline.

We described the scale of the attack as "Internet-threatening," elaborating further that the attack, peaking at more than 300 gigabits per second, "is the kind of scale that threatens the core routers that join the Internet's disparate networks."

Subsequently, posts on Gizmodo and The Guardian called into question these assessments, with Gizmodo casting doubt on the description by asking some "simple questions" and The Guardian specifically claiming that it was "shoddy journalism."

We stand by our original description and reporting. Here's why.

A network of networks

Before looking at the anti-Spamhaus attacks specifically, it's important to know a little about how the Internet is constructed. The Internet is often described as a "network of networks." Organizations around the world have their own independently owned and operated networks—university campuses, the retail Internet Service Providers (ISPs) that provide DSL, cable, and more exotic connections to homes and businesses, corporations, government departments, and so on and so forth.

All of these are useful networks in their own right, but they become enormously more useful when they're joined up. Joining up networks creates an internetwork. The first internetwork infrastructure came from the US government, and the first internetwork, ARPANET, joined a number of US universities in the 1970s.

Through the development of a series of other internetworks—both academic and commercial—and the establishment of international internetworks, we came to the situation we have today.

A small number of companies (about a dozen, though it's hard to know with absolute certainty) own and operate high-speed, transnational networks. These companies, called Tier 1 providers, pass traffic between one another freely, providing transfers between smaller networks. This free traffic transfer is called peering.

They provide the thing that's closest to the Internet's "backbone" (though the term isn't really accurate: there's no single fragile spine, but rather a complex mesh of redundant, interconnected networks): from a Tier 1 provider, it's possible to send traffic to any public IP address.

Purchasing connectivity from the Tier 1 providers are the Tier 2 providers. Tier 2 providers buy Internet connectivity from Tier 1 providers, which is called transit. However, they also connect directly to other Tier 2 providers, with peering relationships. Tier 2 providers can be regional, but they can also be large transnational networks.

How customers connect to ISPs and ISPs connect between tiers. Credit: Wikimedia Commons

Large Tier 2 providers can peer with many, many other Tier 2 providers, with the result that Internet traffic from that provider only infrequently has to use the Tier 1 connectivity. The distinction between Tier 1 and Tier 2 is not size or scale as such; it's simply that Tier 1 networks only use peering. Tier 2 networks have to buy at least some transit.

Tier 1 providers generally sell only to Tier 2 providers. Tier 2 providers may sell directly to end users, or they may sell to Tier 3 providers: ISPs who only buy transit and don't have any peering.

Tier 2 and 3 providers fall into two further categories. They can be multi-homed, with multiple transit connections to different networks, or they can be single homed, with just one transit link.

When two providers want to connect to one another, whether for peering or for transit, they obviously need a physical link of some kind. For providers with only a few connections, one-off point-to-point connections known as private network interconnects (PNIs) are used. But if you want to connect with lots of peers, you don't want to build lots of individual expensive optic fiber links. You want to consolidate: bring all the peers together in one place, and then stick a router or a network switch between them all to join them up.

As a result, around the globe are dotted a few hundred Internet Exchanges (IXs). At each IX, there may be hundreds of providers from all three tiers coming together. The IXs generally use Ethernet infrastructure for their internal connectivity. Gigabit and 10 gigabit Ethernet are predominant, but 100 gigabit Ethernet is starting to gain more use, though its cost today prevents it from being used as the standard technology. Longer links may be gigabit, 10 gigabit, 40 gigabit, or 100 gigabit. In principle, faster speeds still are possible through aggregating these 100 gigabit connections, but in practice, today's IXs are mainly 10 gigabit (or aggregated multiples thereof) networks.

IXs are important. Major service providers such as Google, Microsoft, and Facebook connect to IXs. If two Tier 2 operators can send traffic directly to each other, via peering at an IX, that's cheaper and more efficient than going via transit to a Tier 1.

Enter Spamhaus, STOPhaus, and CloudFlare

STOPhaus doesn't care much for Spamhaus. Credit: Twitter

Spamhaus provides useful services to e-mail administrators wishing to keep junk e-mail out of the servers they own and operate. STOPhaus is an informal group that doesn't like Spamhaus. STOPhaus members wanted to knock Spamhaus off the Internet using a distributed denial of service (DDoS) attack that flooded Spamhaus's systems and drowned out legitimate traffic. They did so by aiming a flood of DNS traffic at Spamhaus's servers.

In response, Spamhaus started using the services of CloudFlare, a company that specializes in providing robust serving that's difficult to take offline with DDoS attacks. CloudFlare does this by replicating content around the globe and using a routing technique called anycast. Anycast allows servers with the same IP address to coexist simultaneously around the globe. Internet providers will generally route traffic to the geographically nearest instance of those anycasted IP addresses.

This does two things. By picking a site that's geographically close, it cuts the latency to access the site, making it react faster. Second, it dilutes the effect of DDoS attacks. Instead of a distributed attack using systems around the world being able to focus its flood on a single IP address in a single location, each attacking system can only focus on a nearby target.

Two attackers on opposite sides of the world may still be aiming at the same victim IP address, but their traffic will go to different computers that are relatively nearby.

For CloudFlare's technology to work well, it needs a high level of distribution. The company currently reports that it has 23 data centers around the world and peers with nearly 70 different Tier 1 and Tier 2 providers around the world; it does this with a mix of PNIs and IXs.

CloudFlare did its job, and Spamhaus remained accessible. Trying to flood the anycasted addresses wasn't working.

So the attackers changed their approach. Rather than attacking CloudFlare's distributed servers, they took aim at the network infrastructure used by CloudFlare's providers: the IXs. Attacks were made on IXs in Frankfurt, Amsterdam, London, and Hong Kong. It's the London IX, LINX, that suffered.

Optical patch panel at the AMS-IX Internet exchange point in Amsterdam, which was targeted by the attackers. Credit: Wikimedia Commons

Each provider peering at LINX has its own IP address, through which traffic to that provider is passed. The attackers noticed that LINX's IP addresses were accessible from anywhere in the world. This, in turn, meant that they could be the target of a DDoS attack.

On March 23rd, the attackers used this information to attack specific addresses within LINX. As is typical in IXes, these are addresses that are generally interconnected with 10 gigabit Ethernet. Throwing hundreds of gigabits per second swamped them. The result was that CloudFlare-protected services were, for some people (especially within the UK), slow or inaccessible. LINX also suffered an issue with its traffic monitoring, which showed traffic across its network approximately halved, that may have been related.

LINX subsequently changed its network configuration so that the IP addresses in question weren't reachable from outside LINX's own trusted network. This cut off the attacks, and normal operation was restored soon after.

The fault here was arguably in part LINX's, as it should have been configured in a safer way from the outset (the Amsterdam IX (AMS-IX), for example, explicitly prohibits advertising routes to its internal IP addresses), but it wasn't, and it caused trouble as a result. That said, the IX community does not universally agree with this approach.

Breaking IXs breaks the Internet

IX infrastructure is core to the Internet. It is not the only Internet infrastructure, and there would still be an Internet if an IX blew up or burned down, but it wouldn't be the same Internet. LINX's infrastructure in aggregate has several terabits per second of capacity, and the Internet as a whole has an aggregate of hundreds of terabits per second of capacity, but any one provider within LINX has only a fraction of that capacity; big ISPs have 80-100 Gbps, but few (if any) have more than that. Having lots of bandwidth somewhere else in the world doesn't actually help very much.

Moreover, 300Gbps is well above the level at which it's easy to quickly add extra bandwidth to respond. 100 gigabit Ethernet is expensive: IXs and ISPs don't have an abundance of 100 gigabit network ports lying around waiting for a rainy day, and they certainly don't give every customer peering at the IX an extra few hundreds of gigabits of capacity "just in case." At LINX, for example, 100 gigabit ports are installed on demand. They're too expensive to treat any other way.

Richard Steenbergen, currently CTO for GTT, a large network provider and upstream operator to, among other customers, CloudFlare, wrote in response to Gizmodo's article:

My company, most other large Internet carriers, and even the largest Internet exchange points, all deliver traffic at multi-terabits-per-second rates, so in the grand scheme of things 300 Gbps is certainly not going to destroy the Internet, wipe anybody off the map, or even show up as more than a blip on the charts of global traffic levels. That said, there is absolutely NO network on this planet who maintains 300 Gbps of active/lit but unused capacity to every point in their network. This would be incredibly expensive and wasteful, and most of us are trying to run for-profit commercial networks, so when 300 Gbps of NEW traffic suddenly shows up and all wants to go to ONE location, someone is going to have a bad day.

To make this more concrete: GTT has multiple terabits per second of connection around the world. But its IPv4 connectivity at LINX is reported to be 30Gbps. Send more than 30Gbps of traffic to its LINX IP address and anyone counting on using GTT for peering/transit through LINX is going to have a rough time. CloudFlare appears to have just 10Gbps of connectivity to LINX. The Internet is full of choke points such as this.

Paul Vixie, Internet engineer and co-ounder of the Internet Systems Consortium, concurred, telling Ars via e-mail, "300 Gbps is fatal for some parts of the 'Net, but not all parts. It's when they started going after Internet exchange connections that third parties started losing."

Large providers—both on the demand side, such as ISPs, and the supply side, such as Facebook or Google or the BBC—peer at multiple IXs and have PNIs, so they're not so dependent on the health of any one IX. Small ones, however, do not. Flood the IX's infrastructure and they'll effectively drop off the Internet.

This is breaking the Internet. The "network of networks" reverts to being "disjoint networks," at least for some. For the rest, multihoming should mask any fatal errors. Things may be a little slower, and for ISPs having to switch to transit instead of peering they may be a little more expensive, but disruption shouldn't be too visible.

Similar behavior occurs in other Internet incidents. When undersea cables are cut, it's rare for a national network to be completely isolated, but cut enough cables and the Internet can become disjointed, as it reportedly did in East Africa after four cables were cut simultaneously in 2012. When faced with cable cuts, the global Internet is fine, and the national networks are also fine. They're just not joined up.

Similarly, when Pakistan published routes disabling YouTube to the global Internet, almost every network making up the Internet remained reachable, except one: YouTube's network.

STOPhaus even tried a similar attack of their own on Spamhaus, trying to hijack Spamhaus's IP address range and redirect it to CyberBunker.

The Internet is generally quite resilient to this kind of thing. But problems do happen.

Not that shoddy

If the Gizmodo and Guardian writers were perhaps expecting a broken Internet to mean that the entire thing simultaneously fell apart into a million different networks, then certainly, these attacks (and others, such as hijacking IP addresses or cutting cables) won't "break the Internet."

If that's what you're after, however, nothing really will. Not because the Internet was designed to survive a nuclear attack—it wasn't—but because it has grown to be widely distributed, with lots of redundant links, and few people really care about the entire Internet.

Gizmodo's questions about the attacks were:

Why wasn't my internet slow?
Why didn't anyone notice this over the course of the past week, when it began?
Why isn't anyone without a financial stake in the attack saying the attack was this much of a disaster?
Why haven't there been any reports of Netflix outages, as the New York Times and BBC reported?
Why do firms that do nothing but monitor the health of the web, like Internet Traffic Report, show zero evidence of this Dutch conflict spilling over into our online backyards?

Four of those, at least, are easy enough to answer.

Because you're an American, in America, primarily accessing American sites. The Internet, however, is a global network. Disruption in one area need not lead to disruption in other areas, particularly if the services you are interested in are geographically close. Network security company Arbor Networks noted that the DDoS attack was substantially larger than those that have gone previously, and its Asia Pacific analyst Roland Dobbins wrote that problems were indeed seen by providers in Europe, the Middle East, Africa, and Asia-Pacific.
They did. Quoting Andree Toonk, a network engineer for OpenDNS, "Those who claim there was no impact probably don't run global networks. I've seen Tier1's struggle and had to route around it, EU and Asia! significant packet loss." This corroborates CloudFlare's claim that Tier 1 providers were congested.
People who do not work for CloudFlare are saying that the attack was substantial, that it was disruptive, and that it caused service problems for some people. Indeed, they're annoyed by it, as it rendered other CloudFlare-hosted sites unusable from the UK. For example, Andy Gambles of UK-based SSL provider and CloudFlare customer ServerTastic complained to CloudFlare, "Our sites were dead slow/practically offline for the whole time."
Who knows?
Two reasons. First, because the Internet Traffic Report doesn't monitor Africa at all, has poor coverage of Asia, has European data that's sporadic at best (lots of the systems it tests simply aren't returning any traffic at all), and provides only aggregate graphs for periods longer than 24 hours, making it impossible to see local effects that occurred on the 23rd of March. It's a useful resource, but hardly the final arbiter of whether the Internet is working well or not. Second, because the Internet doesn't work that way. If a network that you don't care about has been cut off from the network of networks, you'll never notice or care.

CloudFlare's blog post, "The DDoS that almost broke the Internet," certainly had a rather hyperbolic title. It's probably not the first blog post to have a hyperbolic title. It almost certainly won't be the last. Shattering the Internet into a billion disconnected hosts will never happen, so in that sense, the Internet is safe. But breaking it into two, or three, or a handful of separate networks? With the right amount of traffic in the right place, that can happen.

Listing image: violinha

58 Comments