This site uses advanced css techniques
This is a private work in progress; please do not circulate. Yet.
Myspace.com is one of the most
popular sites on the internet (in the top ten, according to Media
Metrics), and in June 2005 started to receive massive reports about
customers being unable to reach the website: it just seemed too widespread
to be just an isolated incident. Many frustrating hours later it was
revealed that the problem was a DNS hijack done by none other than SBC Internet.
This Tech Tip recounts the incident and how we got to the bottom of it.
On Friday, 3 June 2005 at just before 3PM, a customer IM'd me with reports that his users couldn't resolve his domain name (for this Tech Tip, I'm using example.com throughout rather than the real domain of myspace.com). He gave me a snippet that one of his customers passed him:
Server: dns1.1sanca.sbcglobal.net Address: 206.13.29.12 dns1.1sanca.sbcglobal.net can't find www.example.com: Non-existent domain
At first I thought it was just a typo in the name of the nameserver:
but even querying the proper nameserver gave no answer. I have been an SBC California customer (in Pacific*Bell territory) for years and know their DNS infrastructure fairly well; everybody here knows that 206.13.29.12 is the LA resolver.
When getting strange answers from a DNS server, one usually starts by querying the SOA (Start of Authorities) record. It contains housekeeping data for the zone that suggests (among other things) where the master data for this zone is kept. Asking the right place (at UltraDNS), we see:
$ dig +short example.com soa udns1.ultradns.net. hostmaster.example.com. 2005060303 10800 3600 2592000 86400
We also see the serial number -- 2005060303 -- which by convention often contains the date of the last manual update to the contents of the zone file.
Asking the SBC nameserver for this should yield the same data, but it did not:
$ dig +short @dns1.lsanca.sbcglobal.net example.com soa localhost. postmaster.localhost. 2004052400 3600 1800 604800 3600
Huh? Localhost? Year=2004? This makes no mention of example.com, and it just doesn't smell right even if you're not comparing it with the correct data.
Using a list of SBC California resolving nameservers gave an easy way to query all of them at once for this SOA data from a shell script:
#!/bin/sh # # checkdns.sh - query SBC nameservers for our SOA # while read server junk do echo ===$server dig +norecur +short @$server example.com soa done <<EOF 206.13.28.12 # dns1.snfcca.sbcglobal.net 206.13.29.12 # dns1.lsanca.sbcglobal.net 206.13.30.12 # dns1.sndgca.sbcglobal.net 206.13.31.12 # dns1.scrmca.sbcglobal.net 63.202.63.72 # dns1.frsnca.sbcglobal.net 64.160.192.70 # dns1.bkfdca.sbcglobal.net 64.169.10.7 # dns1.renonv.sbcglobal.net 64.169.140.6 # dns1.sktnca.sbcglobal.net EOF
Each one gave identical "localhost" SOA responses. Something is very wrong here: perhaps it's some kind of DNS poisoning, perhaps this was a previous DNS configuration that was rolled back, or maybe SBC nameservers are getting bad data from UltraDNS.
Checking the root nameservers for example.com, it correctly shows two NS entries:
UDNS1.ULTRADNS.NET 204.69.234.1 UDNS2.ULTRADNS.NET 204.74.101.1
and both produced the right answers, but this has not gone far enough. UltraDNS - like the root servers - use a technique known as anycast, where multiple diverse machines can use the same IP address, and it relies on BGP routing to get a user to the closest physical server. When I ping 204.69.234.1 it could reach a different machine than when you do so from elsewhere on the internet. It's a great system that's optimized both for performance and robustness.
But what if there were some kind of synchronization issue between the real nameservers? This seemed unlikely, mainly because the UltraDNS folks are very much on the ball, but because I'm also on the SBC network, I'd expect to get the same path as the SBC nameservers. If I see correct data from UltraDNS, so should SBC.
Getting on a conference call with my Example.com manager and the support guy from UltraDNS, we were able to query the individual physical nameservers (they each have a separate, unique IP address as well), and it was clear that they were all working correctly. UltraDNS was not the problem here.
So this brings us back squarely to SBC. Through my own networks and via colleagues, we were able to make this same SOA query to other ISP nameservers - Cox, Broadwing, Verio, Earthlink, etc. - and in no case got the bogus data. Asking Example.com support to check their reports from customers, it turns out that none of them was from outside the SBC network.
So what might this be? DNS cache poisoning is an obvious guess, where the bad guy sends an answer to a question not asked, and the nameserver caches it as genuine, but this didn't feel likely either:
$ dig +short @dns1.lsanca.sbcglobal.net version.bind txt chaos "9.2.4"The BIND nameserver software can be configured to report anything to this query, so it may not be genuine, but BIND 9.2.4 is not known to be susceptible to DNS poisoning.
This was completely puzzling, and none of us on the conference call had any real idea what was going on other than to know that it involved SBC. We ended the call with a promise that I'd send the UltraDNS fellow all my detailed notes.
After the call I told the customer how impressed I was with the UltraDNS guy and now nice it was to talk with a truly competent DNS support professional. It turned out to be the company's CTO, Rodney Joffe. Smart, nice guys are a joy to work with.
At just after 5:30 PM, we got some news and the whole picture emerged. Apparently, SBC identified a DoS attack on its customer nameservers as identified by repeatedly asking for the same record over and over.
I don't recall which resource record was being requested, but my best guess - and I am speculating here - is that SBC was seeing a DNS smurf attack on another party, with their own nameservers being the unwitting middleman.
Sidebar: DNS smurf/amplification attacks
![]()
DNS queries are generally made to a nameserver over port 53/udp, and the request packet includes both the name and type being requested. One can ask for type=A (IP address), type=MX (mail server), type=NS (nameserver), or even type=* (anything), and the answer is likewise returned in a UDP datagram.
The request is usually quite small -- 50 to 70 bytes -- but the reply can vary substantially depending on how much information is available to answer the question. When the request asks for "anything" for a given resource record, it's not uncommon to get hundreds of bytes of data in reply:
In the case of Example.com, the DNS request of 57 bytes was met with a response of 336 bytes, a 5.89 X increase in size. This is perfectly normal if it's part of a routine, one-time query: it's exactly what DNS is there for.
But when an attacker sends this request repeatedly and with a spoofed source IP address, the reply from the DNS server goes not to the attacker, but to the victim whose IP address is spoofed. This appears to the victim as a steady stream of un-asked-for DNS replies.
Because the reply is so much larger than the request, the middleman DNS server is providing a nearly six-fold amplification in the data rate being sent to the victim.
In this respect, the domain name being used is chosen strictly based on the amplification factor and not due to any desire to target the domain itself. Indeed: example.com may well be generally unaware that any of this is going on. Any other domain with a beefy response to a type=any request would have served the purpose nicely.
But this is strictly speculation: we have some indications that SBC may have been seeing an attack that more directly targetted example.com, but this is still unclear.
If this was the case - SBC saw their own DNS servers under attack - they had an absolutely legitimate interest in protecting their infrastructure. These machines were providing DNS to many thousands of SBC customers, and the attack had to be mitigated somehow.
But their response was as shockingly incompetent as it was disastrous for Example.com:
It appears clear that SBC installed an empty, authoritative zone for example.com and deployed it on their resolving nameservers systemwide.
And they didn't tell anybody.
Installing an empty zone as authoritative means that requests for data from that zone will be honored directly ("we have nothing for you") rather than recursively tracking down the real data from the real nameservers elsewhere. No more www, mail server, or anything else. This property passes hundreds of megabits/sec of traffic to SBC.
This had the effect of completely dropping Example.com from the internet for the great majority of SBC users, without notice or explanation. This is similar to a PC user installing an entry for example.com in his hosts file, but on an ISP-wide basis for all customers.
Important:It should be noted that SBC has no relationship of any kind with Myspace.com; they're not our ISP, they don't provide DNS, mail, web or any other services. Their actions were completely unilateral.
We'd never even heard of this kind of hijack before.
This was an atrocity even when viewed in a light most favorable to SBC. If the attack started late at night, it's entirely possible that the third shift was simply not equipped to respond properly. Faced with an attack affecting their customers, this very well may have been the only thing they knew to try. It certainly would have mitigated the attack, and it may not be so unreasonable on its face.
But when one drops a huge property off the internet in the middle of the night, even with good reason, one has a duty to attend to it properly once daylight rolls around. Perhaps there are no experts available at midnight, but there surely must be some at 9AM.
It's even worse than this. Once UltraDNS was able to track down the right people at SBC to discuss this, the fix still apparently waited for some DNS engineer to make it through town after being stuck in Northern California traffic. This accounted for almost an extra hour of hijack.
I have a customer with perhaps 200 employees, and he has at least two staffers who could stumble through an important DNS change of this nature. That SBC -- almost three orders of magnitude larger -- has only one engineer who can make DNS changes in an emergency seems very scary and/or lame to me.
I don't know in detail what technical mitigations they reasonably should have taken, but I can think of at least two candidates.
First, and most obviously, notify somebody at example.com that their domain has been taken down on the SBC network. Though SBC may not care all that much, Example.com certainly has an interest in helping SBC find a better mitigation than turning them into a black hole. Example.com wasn't complicit in any of these attacks: they merely had DNS data that amplified well.
Second would have been to provide a clue that they had done this, perhaps by changing the SOA record to something like:
ddos.mitigation. security.sbcglobal.net. 2004052400 3600 1800 604800 3600
It would have at least left some breadcrumbs for us to follow and ruled out the horrible game of twenty-questions finding out the responsible party.
It was only due to UltraDNS having excellent contacts at SBC (and/or some amazing persistence) that they were able to track this down on a Friday night. I'm reasonably sure that I would have gotten there eventually, but it would have certainly taken me much longer to find the right SBC people.
It's not out of the question that it wouldn't have been resolved until Monday morning: it's my opinion that UltraDNS saved Example.com's weekend.
The fix started phasing in at around 6:15PM PDT Friday night, and the Example.com traffic graphs; things were fully back to normal before 7PM.
In the main it seems clear that SBC handled this badly, but it would be unfair to omit countervaling factors that undoubtedly weighed into their decisions.
All network organizations have the right and duty to protect their own infrastructure, and are generally answerable only to their own customers. Myspace is not a customer of SBC, so one can make a principled case that SBC owed no them no duty of any kind.
Furthermore, and more specifically, performing DNS blackholing is part of common practice in several few circumstances. During
Though SBC wasn't responsible for the unknown parties who started this mess with the DoS attack on their servers, it's hard to make the case that they weren't negligent, or at least incompetent, in their response. To silently take down a major internet property with no notice or warning -- even after the fact -- seems patently irresponsible.
This is particularly disappointing considering that Pac*Bell used to have a fantastic DNS department: even today, they still delegate inverse DNS for even a static /29 DSL network. This has been wonderfully helpful.
It's also not clear what Example.com could have done to mitigate this, either before the fact or after. They had a robust DNS architecture that worked flawlessly throughout the entire episode, and since they did not appear to be a DoS target themselves, they wouldn't even have known this was part of the matter at hand.
There is also no substitute for working with very strong technical people: the folks at UltraDNS could not have been more responsive, helpful, or resourceful. They asked the right questions, maintained a sense of urgency appropriate for the customer's concern, and never seemed out of ideas. Two big thumbs up for UltraDNS.
We all hope this doesn't happen again.
First published: 4 June 2005