The Internet Almost Died in 1997
AsinglemisconfiguredrouteratasmallISPinFloridasentabadroutingtablethatcascadedacrosstheentireinternet,takingdownmostofitforhours.ARPANETwasdesignedtosurvivenuclearwar—itcouldn'tsurvivehumanerror.
Part 1: The Network That Was Supposed to Survive Everything
In the late 1960s, the Advanced Research Projects Agency — ARPA — funded the creation of a computer network with a radical design principle: no single point of failure. ARPANET, the precursor to the modern internet, was built so that if any node was destroyed (the Cold War context made "destroyed" mean "hit by a Soviet nuclear warhead"), traffic would simply route around the damage.
The key innovation was packet switching. Instead of establishing a dedicated circuit between two points (like a phone call), data was broken into small packets that could each take a different path through the network. If one path was destroyed, packets would find another. The network was, in theory, indestructible.
By the mid-1990s, ARPANET had evolved into the commercial internet — a sprawling, decentralised web of tens of thousands of independently operated networks. The packet-switching architecture remained. The resilience remained. But a new vulnerability had quietly emerged in the system that told packets where to go.
Part 2: BGP — The Protocol Nobody Thinks About
The internet is not one network. It's a federation of roughly 70,000 (as of the mid-2020s) Autonomous Systems (ASes) — independently operated networks run by ISPs, corporations, universities, and governments. Each AS controls its own internal routing. The question is: how do they coordinate with each other?
The answer is BGP — the Border Gateway Protocol, defined in RFC 1771 (1995) and updated in RFC 4271 (2006). BGP is an "exterior gateway protocol" — it handles routing between Autonomous Systems, not within them.
Here's how it works in simplified terms:
- Each AS has a unique number (ASN). MAI Network Services was AS7007.
- Each AS announces to its neighbours which IP address prefixes it can reach. "I can deliver traffic to 192.0.2.0/24."
- Neighbours record these announcements and propagate them to their own neighbours, prepending their own ASN to the path. "To reach 192.0.2.0/24, go through AS7007 via us (AS1234)."
- Routers across the internet build routing tables from these announcements, selecting the "best" path based on shortest AS path length and local policy preferences.
The entire system is built on an assumption that would be laughable in any other security context: that every participant is telling the truth.
There is no certificate authority for BGP. No central registry that verifies "AS7007 really does own 192.0.2.0/24." If an AS announces a route, its neighbours accept it. This design made sense in 1989, when the internet was a small community of academic and government networks where everyone knew everyone. By 1997, it was a ticking time bomb.
Part 3: April 25, 1997 — The Explosion
The technical details of what happened at MAI Network Services have been reconstructed from router logs, NANOG (North American Network Operators Group) mailing list archives, and subsequent analysis.
MAI was running a route optimisation process — a common procedure where an ISP adjusts its routing configuration to improve traffic flow. During this process, a technician made an error that caused MAI's router to import the entire global BGP routing table — approximately 45,000 routes at the time — into its own IGP (Interior Gateway Protocol).
The router then re-advertised these routes via BGP as if MAI was the origin for all of them. In BGP terms, MAI stripped the existing AS paths from tens of thousands of routes and announced them with itself as the sole origin AS.
MAI's router told the world it was the origin — the source — of approximately 45,000 network prefixes. In reality, it originated perhaps a few hundred.
Because these "originated" routes had an AS path length of 1 (just AS7007), they appeared to be the shortest possible path to those destinations. BGP's route selection algorithm prefers shorter paths. Routers across the internet began selecting MAI as the best path for traffic destined for thousands of different networks.
The cascade was immediate:
Minutes 1-5: MAI's neighbours accepted the bogus routes and propagated them. Their neighbours did the same. The bad routing information spread outward like a shockwave.
Minutes 5-10: Traffic from across North America began converging on MAI's network. Their routers, designed for modest traffic volumes, were overwhelmed. Packet loss spiked. Latency climbed. Then the routers began crashing.
Minutes 10-30: With MAI's network collapsing, the routes pointing to it became unreachable. But BGP convergence is slow — routers don't instantly detect that a path is dead. Instead, they enter a state called "route flapping," rapidly switching between the broken MAI routes and any alternatives they can find. Each flap triggers a BGP update message that must be processed by every router in the path. The update storms consumed router CPU and memory, causing additional routers to fail.
Minutes 30-120: Network engineers around the world began identifying the source of the problem. The NANOG mailing list lit up. ISPs began manually filtering AS7007's announcements. Slowly, legitimate routes were restored as the bogus ones were filtered out.
The internet didn't have a kill switch for bad routes. There was no emergency broadcast system. The fix required thousands of independent operators to individually identify and filter the problem — a process that took hours, not seconds.
Part 4: The Anatomy of a Cascading Failure
The AS7007 incident is a textbook example of a cascading failure in a complex system. Several factors made it worse than it needed to be:
No route filtering. Most ISPs in 1997 did not filter the BGP announcements they accepted from peers and customers. If a customer AS announced routes it shouldn't have, the ISP would blindly propagate them. Basic prefix filtering — checking whether an AS is authorised to announce a given prefix — would have contained the damage. But almost nobody did it.
No route dampening. When a route flaps (alternates between available and unavailable), it generates update messages that consume router resources. Route dampening — temporarily suppressing a flapping route — was defined in RFC 2439 (1997) but not yet widely deployed. Without it, the flapping from AS7007's routes created a secondary crisis of CPU exhaustion on routers that were otherwise fine.
Slow convergence. BGP was designed for stability, not speed. It uses timers and gradual propagation to avoid oscillation. This is normally a feature — you don't want the entire internet's routing tables rewriting themselves every millisecond. But during a crisis, it means bad information persists for a long time and recovery is slow.
No authentication. The fundamental issue. If BGP had included any mechanism for verifying that an AS was authorised to announce a given route, the entire incident would have been caught at the first hop. MAI's neighbours would have rejected the bogus announcements immediately.
Part 5: The Ghosts of AS7007
The 1997 outage should have been a one-time wake-up call. Instead, it became a recurring pattern.
2004 — TTNet (Turkey): Turkish ISP TTNet accidentally announced routes that attracted traffic destined for many major networks. The incident caused widespread disruption across parts of Europe and Asia.
2008 — Pakistan Telecom vs. YouTube: The Pakistani government ordered ISPs to block YouTube (over videos deemed blasphemous). Pakistan Telecom complied by announcing a more specific route for YouTube's IP prefix — a BGP trick that says "I have a more precise map to that destination." The announcement leaked to PCCW, Pakistan Telecom's upstream provider, which propagated it globally. For about two hours, YouTube was unreachable worldwide because all its traffic was being routed to Pakistan, where it was dropped.
In BGP, a more specific route always wins over a less specific one. If YouTube announces 208.65.152.0/22, and Pakistan announces 208.65.153.0/24 (a subset), every router on the internet will send traffic for that subset to Pakistan. This is by design.
2017 — Rostelecom (Russia): The Russian state telecom operator announced BGP routes for prefixes belonging to Google, Apple, Facebook, Microsoft, and several financial institutions. Traffic for these services was briefly routed through Russia. Whether this was accidental or deliberate remains debated.
2018 — MainOne (Nigeria): A Nigerian ISP's misconfiguration caused Google traffic to be routed through China Telecom and then Rostelecom (Russia) for over 74 minutes. Google's services were degraded for users across multiple continents.
2021 — Facebook's self-inflicted outage: Facebook's engineers accidentally withdrew their own BGP announcements during a maintenance operation. The entire Facebook ecosystem — including Instagram, WhatsApp, and Messenger — vanished from the internet for over six hours. Because Facebook's internal tools also relied on the same DNS and routing infrastructure, engineers couldn't even access their own systems to fix the problem. Some reportedly had to physically travel to data centres and manually reset equipment.
Part 6: The Fix That Nobody Deploys Fast Enough
The solution to BGP hijacking has existed in some form since the early 2000s. It's called RPKI — Resource Public Key Infrastructure.
RPKI works by creating a cryptographic chain of trust. Regional Internet Registries (ARIN, RIPE, APNIC, etc.) issue digital certificates that say "AS7007 is authorised to announce these specific prefixes." Routers can then validate incoming BGP announcements against these certificates and reject any that don't match.
It's elegant. It works. And adoption has been painfully slow.
The reasons are depressingly human. RPKI requires every network operator to obtain certificates, configure their routers to perform validation, and maintain the infrastructure. It provides a collective benefit (a more secure internet) but imposes individual costs (engineering time, operational complexity). Classic tragedy of the commons.
The internet was designed to survive a nuclear war. It was never designed to survive the tragedy of the commons.
As of the mid-2020s, adoption of RPKI origin validation has grown significantly but still does not cover the entire internet. Even when RPKI is deployed, many networks operate in "monitor only" mode — they log invalid routes but don't actually reject them, out of fear that a misconfigured RPKI certificate could cause legitimate traffic to be dropped.
The internet's most critical infrastructure protocol — the one that decides where every packet goes — remains, after decades of warnings and repeated catastrophic failures, partially secured at best.
Part 7: What This Tells Us
The AS7007 incident and its descendants reveal something fundamental about complex systems: resilience against one type of failure can create vulnerability to another.
ARPANET was resilient against physical destruction. Destroy a node, and traffic routes around it. But this resilience was achieved through decentralisation and trust — properties that created a completely different attack surface. You can't destroy the internet by bombing a building. But you can disrupt it by lying to it.
The internet's architects were solving for the threats they understood — Cold War nuclear strikes. They couldn't anticipate that the real threat would be a technician in Florida having a bad day.
Every complex system has this property. The defences you build shape the vulnerabilities you create. The Maginot Line made France invulnerable to a frontal assault and perfectly vulnerable to an end-run through Belgium. The Titanic was "unsinkable" against the known failure modes of its era and catastrophically unprepared for the one that actually happened.
The internet almost died in 1997 not because it was poorly designed, but because it was brilliantly designed for a threat that wasn't the one that showed up.
ARPANET's original design
The Cold War origins of the internet — how a network designed to survive nuclear war was built on principles that would later become its greatest weakness.
The resilience was real. But the trust model was a time bomb.
BGP — the two-napkin protocol
BGP was famously sketched out on two napkins by engineers Yakov Rekhter and Kirk Lougheed in 1989. This talk explains how that napkin sketch became the routing backbone of the entire internet.
The napkin protocol worked fine until the internet grew past the point where everyone knew each other.
The NANOG mailing list — AS7007 in real time
The actual emails from network engineers as they realised what was happening on April 25, 1997. Confusion, alarm, and frantic coordination in real time.
The same thing happened again. And again. And again.
Facebook's six-hour disappearance
In 2021, Facebook accidentally withdrew its own BGP routes and vanished from the internet for over six hours. Engineers couldn't even access their own buildings because the door systems used Facebook's network.
A fix exists. It has existed for over twenty years. Here's why almost nobody uses it.
RPKI — the tragedy of the commons in infrastructure
The cryptographic solution to BGP hijacking is real, tested, and available. The problem is getting tens of thousands of independent networks to all adopt it. Classic coordination failure.
Journey complete
You explored the Core path across 5 stops
What you now know
- ARPANET was designed to survive nuclear war through decentralisation and trust — but that trust model created a new class of vulnerability when the network scaled to billions of users
- The AS7007 incident in 1997 demonstrated that a single misconfigured router could corrupt the entire internet's routing system because BGP has no built-in verification
- The same vulnerability has caused major outages repeatedly: Pakistan vs YouTube (2008), Russian traffic hijacking (2017), and Facebook's self-inflicted six-hour outage (2021)
- RPKI — a cryptographic fix — has existed since the early 2000s but suffers from a tragedy of the commons: it costs each network individually to deploy but benefits everyone collectively
- The internet's story is a lesson in how resilience against one type of failure (physical destruction) can create vulnerability to another (trust exploitation)