[ale] Godaddy outage

Scott Plante splante at insightsys.com
Wed Sep 12 11:58:56 EDT 2012


I was at a then Fortune 50 company and their Sys Admin "consultant" from CapGemini deleted their running ERP database. Luckily he realized what he did and asked for help. I was there and convinced them to just hard power off the machine. When it booted, we were able to get the database segments out of lost+found, inspect the binary header at the top of each file to determine the order, rename the files and bring up the database with no data loss or corruption. Poor guy was let go, which I thought was too bad because they'd have been screwed if he'd kept his mouth shut. A proper shutdown or even database stop and the files would have been lost, or at least harder to recover. We were back up and running in <20 minutes. 

----- Original Message -----

From: "Jeff Lightner" <JLightner at water.com> 
To: "Atlanta Linux Enthusiasts" <ale at ale.org> 
Sent: Wednesday, September 12, 2012 10:29:50 AM 
Subject: Re: [ale] Godaddy outage 

I was at a Fortune 500 once where the main ERP DB went belly up. That caused us to: 
1) Declare a disaster and invoke our DR plan which included sending folks and tapes to Philadelphia. 
2) Spend time on trying to recover the original Production. 
3) Build in house systems to try to figure out what caused the issue. 
4) Jump through dozens of other hoops. 

In the end we really couldn't pinpoint what had caused the issue though I suspected fat finger somewhere. The powers that be wouldn't accept this and they kept a team on it for over a year. That team eventually did put out an RCA but I always had my doubts. 

Later one of the DBAs did a fat finger causing another outage and immediately did a mea culpa. He asked if they wanted him to resign and they said no. We were all just so relieved that he had admitted the mistake and saved us all the extra headaches. (As if recovering a major product DB all by itself isn't headache enough - it takes time even with good backups and logs.) 





-----Original Message----- 
From: ale-bounces at ale.org [mailto:ale-bounces at ale.org] On Behalf Of JD 
Sent: Wednesday, September 12, 2012 9:56 AM 
To: Atlanta Linux Enthusiasts 
Subject: Re: [ale] Godaddy outage 

-- 
I saw the JSC mission control primary NFS server for every workstation in the building failover during ascent making all flight monitoring applications static for 45 seconds. This happened while SRBs were firing just after the shuttle was launched. The console cabling from the primary server was so tight that when the main engineer pulled the console 3 inches closer to get a better view, the cable disconnect caused a failover event. 
I didn't cause this, but was sitting a few yards away monitoring a different server during critical flight phases. 
-- 
Spaces are critical. A month of work was destroyed by a single space error. 
* "rm -rf directory*" vs "rm -rf directory *" 
-- 
Elsewhere, I have seen a few $2M/hr outages due to corrupted Oracle tables after a software vendor told a support guy to type a specific command on the running, production, dispatching system. That took about 7 hours to recover and months to convince upper management that it was a fluke and not the fault of the guy doing the typing. 
-- 
The problems that I've causes were usually due to bad scripting. 
* find+rm is dangerous. 
* rsync can destroy a system - especially if this is a backup just prior to an upgrade 
-- 

I suspect we all have seen some pretty interesting outages over the years. 


On 09/11/2012 08:19 PM, simontek at gmail.com wrote: 
> Worked at a data center in LA, netzero looped their switch, took out 
> the whole network. I assumed, they were an isp, and knew better. They 
> called to complain, I had the fun of telling them, I took them offline 
> til they fixed their issues. Not fun. Also babysat them, every time 
> they came in to work on stuff. Sent via BlackBerry from T-Mobile 
> 
> -----Original Message----- From: Jeff Hubbs <jhubbslist at att.net> Sender: 
> ale-bounces at ale.org Date: Tue, 11 Sep 2012 20:11:24 To: 
> <stephen.r.blevins at gmail.com>; Atlanta Linux Enthusiasts<ale at ale.org> 
> Reply-To: Atlanta Linux Enthusiasts <ale at ale.org> Subject: Re: [ale] 
> Godaddy outage 
> 
> I once worked at a place where there was a guy who, not meaning to 
> cause trouble, created subdirectory after subdirectory on a Mac until 
> the OS wouldn't function anymore. We named it the "Copeland Worm" in his honor. 
> 
> 
> On 9/11/12 7:59 PM, Stephen R. Blevins wrote: 
>> Early in my IT career (early 1980's), I learned that "No malevolent 
>> cracker, no matter how malicious, can even begin to do the damage an 
>> authorized and well-meaning but incompetent user can do." 
>> 
>> QED 
>> 
>> 
>> Stephen R. Blevins stephen.r.blevins at gmail.com 
>> 
>> On 09/11/2012 02:30 PM, Michael H. Warfield wrote: 
>>> On Tue, 2012-09-11 at 13:53 -0400, Matt Hessel wrote: 
>>>> Well anonymous is claiming they took it down, I don't know if 
>>>> anyone at godaddy broke it. :) 
>>> NO! 
>>> 
>>> First and foremost... "Anonymous" has not claimed any action. One 
>>> individual down in Brazil using a handle that has been associated 
>>> with Anonymous has claimed to have done this but stated they were 
>>> acting independently. The collective has not claimed this and it 
>>> remains unconfirmed. 
>>> 
>>> Second... GoDaddy itself now claims it was not hackers and not a 
>>> DoS attack but a royal screwup in their routers that resulted in 
>>> corrupted routing tables. I'm not totally sure how much credibility 
>>> I will lend to that idea but, if true, this is one of the grandest 
>>> screwups since Microsoft dicked up their DNS years and years ago 
>>> with all their public name servers on a single network segment and 
>>> then cut them off from the private master name server with a firewall update. 
>>> 
>>> I'm not sure which is worse. Being hammered by a collective of 
>>> malicious individuals out to get you or displaying a level of 
>>> technical incompetence and inability to follow RFCs and BCPs that 
>>> would put a technotard to shame! How did they manage to put all 
>>> their (DNS) eggs in one basket so that a single point of failure 
>>> could have such wide spread consequences??? Well, I guess they are on good company. MS has done it. 
>>> AT&T has done it. Others have done it. You would think they would 
>>> know better but they obviously do not. 
>>> 
>>> Regards, Mike 
>>> 
>>>> On Sep 11, 2012 1:41 PM, "Scott Plante" <splante at insightsys.com> 
>>>> wrote: 
>>>>> Yes, we use GoDaddy for registration but not DNS nor hosting and 
>>>>> we were unaffected. Our one client who was affected used them for 
>>>>> registration and DNS, but not hosting and they were affected. It 
>>>>> was just name resolution though, you could still access their 
>>>>> externally hosted site by IP of course. I don't know anyone who 
>>>>> was hosting with GoDaddy. You couldn't get to godaddy.com but I 
>>>>> didn't know their IP to try that. 
>>>>> 
>>>>> I imagine someone's in big trouble, if not fired, over that one. 
>>>>> 
>>>>> Scott 
>>>>> 
>>>>> ------------------------------ *From: *"Brian Stanaland" 
>>>>> <brian at stanaland.org> *To: *"Atlanta Linux Enthusiasts" 
>>>>> <ale at ale.org> *Sent: *Monday, September 10, 2012 5:51:13 PM *Subject: 
>>>>> *Re: [ale] Godaddy outage 
>>>>> 
>>>>> I know one group with DNS by GoDaddy but hosting elsewhere has 
>>>>> been affected. All machines are still reachable via IP address, of 
>>>>> course. Speaking of which, anyone know if GoDaddy hosted sites can 
>>>>> be reached by IP? 
>>>>> 
>>>>> --Brian 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Sep 10, 2012 at 5:18 PM, Michael H. Warfield 
>>>>> <mhw at wittsend.com>wrote: 
>>>>> 
>>>>>> On Mon, 2012-09-10 at 15:49 -0400, Scott Plante wrote: 
>>>>>>> You guys notice the Godaddy DNS outage? I have a customer' s 
>>>>>>> website 
>>>>>> down. 
>>>>>> http://techcrunch.com/2012/09/10/godaddy-outage-takes-down-millio 
>>>>>> ns-of-sites/ 
>>>>>> 
>>>>>> 
>>>>>> 
Been following this... Their DNS servers are impacted. Hosting servers 
>>>>>> indeterminate. Claims are made that #Anonymous3 down in Brazil 
>>>>>> is behind this for one reason or another but no one else from 
>>>>>> Anonymous has stepped up to the plate and claimed responsibility. 
>>>>>> Looks to be a loose cannon with a wild hair at this point... 
>>>>>> 
>>>>>> If you are using them as a registrar but are managing your own 
>>>>>> DNS then you do not seem to be impacted at this time. 
>>>>>> 
>>>>>> If you are using their DNS servers then you are probably impacted 
>>>>>> whether you are hosting with them or not. 
>>>>>> 
>>>>>> If you are using their hosting services but managing your own 
>>>>>> DNS, please let us know. I have no data points on this curve. 
>>>>>> 
>>>>>>> Scott 
>>>>>> Regards, Mike -- Michael H. Warfield (AI4NB) | (770) 985-6132 | 
>>>>>> mhw at WittsEnd.com /\/\|=mhw=|\/\/ | (678) 463-0932 | 
>>>>>> http://www.wittsend.com/mhw/ NIC whois: MHW9 | An optimist 
>>>>>> believes we live in the best of all PGP Key: 0x674627FF | 
>>>>>> possible worlds. A pessimist is sure of it! 
>>>>>> 
>>>>>> _______________________________________________ Ale mailing list 
>>>>>> Ale at ale.org http://mail.ale.org/mailman/listinfo/ale See JOBS, 
>>>>>> ANNOUNCE and SCHOOLS lists at 
>>>>>> http://mail.ale.org/mailman/listinfo 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- Government's view of the economy could be summed up in a few 
>>>>> short 
>>>>> phrases: If it moves, tax it. If it keeps moving, regulate it. And 
>>>>> if it stops moving, subsidize it. - *Ronald Reagan (1986) 
>>>>> 
_______________________________________________ 
Ale mailing list 
Ale at ale.org 
http://mail.ale.org/mailman/listinfo/ale 
See JOBS, ANNOUNCE and SCHOOLS lists at 
http://mail.ale.org/mailman/listinfo 




Athena(r), Created for the Cause(tm) 
Making a Difference in the Fight Against Breast Cancer 

--------------------------------- 
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. 
---------------------------------- 


_______________________________________________ 
Ale mailing list 
Ale at ale.org 
http://mail.ale.org/mailman/listinfo/ale 
See JOBS, ANNOUNCE and SCHOOLS lists at 
http://mail.ale.org/mailman/listinfo 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20120912/eedfb102/attachment.html 


More information about the Ale mailing list