[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
FYI Netflix is down
- Subject: FYI Netflix is down
- From: george.herbert at gmail.com (George Herbert)
- Date: Mon, 2 Jul 2012 14:04:08 -0700
- In-reply-to: <[email protected]>
- References: <CA+zb_vEbEd8QY1wW-uNyUb1CuBAD-b5h-g6ysmNV6BCpX7mTiQ@mail.gmail.com> <[email protected]> <CAPiURgUxLCrQ0OfprT4YknTd9c4xoi6uLnm4YwLmf=bc-TJnXQ@mail.gmail.com> <8078ED370ADA824281219A7B5BADC39B1D61037C@MBX023-W1-CA-5> <[email protected]> <CAFFgAjDDpqC8qzxDm9UHemMyvMVOwRdS74K-gCvRXR-aN8ZD9A@mail.gmail.com> <[email protected]> <CABL6YZQErJ2r3hktkfL4=xrkZNeJWnteAMQAvcXdEV8OPQ=U9g@mail.gmail.com> <[email protected]> <CAJEFqDeyaj9KZVNi0xA0+VHnh_BAzoJKZd_gs8b0Sxr9F9c=3g@mail.gmail.com> <[email protected]> <CAK__KzvdYmACrT4=in0y-q9R=6qcM-Swe9xuCc_bsLxqkmT3mg@mail.gmail.com> <[email protected]>
On Mon, Jul 2, 2012 at 12:43 PM, Greg D. Moore <mooregr at greenms.com> wrote:
> At 03:08 PM 7/2/2012, George Herbert wrote:
>
> If folks have not read it, I would suggest reading Normal Accidents by
> Charles Perrow.
>
> The "it can't happen" is almost guaranteed to happen. ;-) ?And when it does,
> it'll often interact in ways we can't predict or sometimes even understand.
Seconded.
There are also aerospace and nuclear and failure analysis books which
are good, but I often encourage people to start with that one.
> As for pulling the plug to test stuff. I recall a demo at Netapps in the
> early 00's. ?They were talking about their fault tolerance and how great it
> was. ?So I walked up to their demo array and said, "So, it shouldn't be a
> problem if I pulled this drive right here?" ?Before I could the salesperson
> or tech guy, can't remember, ?told me to stop. ?He didn't want to risk it.
>
> That right there said loads about their confidence in their own system.
I worked for a Sun clone vendor (Axil) for a while and took some of
our systems and storage to Comdex one year in the 90s. We had a RAID
unit (Mylex controller) we had just introduced. Beforehand, I made
REALLY REALLY SURE that the pull-the-disk and pull-the-redundant-power
tricks worked. And showed them to people with the "Please keep in
mind that this voids the warranty, but here we *rip* go...". All of
the other server vendors were giving me dirty looks for that one.
Apparently I sold a few systems that way.
You have to watch for connector wear-out and things like that, but ...
All the clusters I've built, I've insisted on a burn-in time plug pull
test on all the major components. We caught things with those from
time to time. Especially with N+1, if it is really N+0 due to a bug
or flaw you need to know that...
--
-george william herbert
george.herbert at gmail.com