Announcement

Collapse
No announcement yet.

Ratdude's bad HDD day

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Ratdude's bad HDD day

    Today the four Western Digital WD740GD's (74GB 10K SATA) in my main workstation took a crap at the same time. ALL 4 OF THEM!

    What's happened is the RAID controller now and them would occassionally hiccup; it would freeze (solid HDD light) and I'd hard reboot, with the 3ware 9550sx missing from BIOS. Hard reboot again, it would show up, and I'd have to go into BIOS to re-add it to the boot list. Hadn't done this to me in months. So today, I go to reply to a thread on another forum, it does this, fine, rinse and repeat, all seems well. I go snap some pictures (soldering samples for another, err, project) and long behold, the system is running VERY slow, not getting me to a login screen (as the screen is set to auto-lock). I put it to sleep, wake it up, get a login screen, it's running again... yay... wait... BSOD!

    Anyway, the card reports all 4 drives with a status of "drive error" Rescanning in the 3ware boot menu doesn't change it, nor did swapping the RAID cards (I have two other 9550sx's in other builds not in service). I reseated all my cabling too... all 4 seem to be 100% dead (either SMART or not responding to controller commands as per the 9550 manual).

    Very odd... I know Topcat used to run a very similar setup (Tyan S2895 Thunder K8WE with a 3ware 9550 and SLI'd GPUs) and AFAIK he never had anything like this happen...

    Anybody whose used 3ware cards seen this happen before? It's very strange that all 4 would fail the same way at the exact same time... unless it's some bizzare cascading failure?
    sigpic

    (Insert witty quote here)

    #2
    Re: Ratdude's bad HDD day

    errr its quite impossible and statistically improbable for 4 hard drives to fail at the exact same time. badcaps on mobo perhaps? maybe around the pci slot area or atx connector? those raid cards can be quite power hungry. the raid card disappearing and appearing on a hard reboot sounds like a badcap symptom related to unstable power delivery to the raid card to me.

    thus, i dont think the drives actually failed. connect the drives to the mobo's onboard sata ports instead and boot a live session of linux. also see if bios can detect the disks. use linux's disk utility and see if u can read the smart data from the disks to see if any smart parameters failed. also run a disk read only benchmark to ascertain the drives still work up to speed right.
    Last edited by ChaosLegionnaire; 01-03-2016, 11:13 PM.

    Comment


      #3
      Re: Ratdude's bad HDD day

      Originally posted by ChaosLegionnaire View Post
      errr its quite impossible and statistically improbable for 4 hard drives to fail at the exact same time. badcaps on mobo perhaps? maybe around the pci slot area or atx connector? those raid cards can be quite power hungry. the raid card disappearing and appearing on a hard reboot sounds like a badcap symptom related to unstable power delivery to the raid card to me.

      thus, i dont think the drives actually failed. connect the drives to the mobo's onboard sata ports instead and boot a live session of linux. also see if bios can detect the disks. use linux's disk utility and see if u can read the smart data from the disks to see if any smart parameters failed. also run a disk read only benchmark to ascertain the drives still work up to speed right.
      Perhaps. The board was recapped just over a year ago.

      I have a USB SATA adapter that I could test these 1 by one on... no SMART though with that route.

      I doubt it's a power delivery issue; I have a 600W Seasonic PSU with the proper 6 pin power connector.
      sigpic

      (Insert witty quote here)

      Comment


        #4
        Re: Ratdude's bad HDD day

        One drive tested, 63MB/s read, 8msec access time. Will go through all four drives.

        If it's the mobo... ...although if I could do it again (with any motherboard for the platform), I'd go for the supermicro version. Has an extra card slot and all (so I could go up to a 9650 and run both a QAM tuner AND a sound card). However, last I checked they were in the $400 range on ebay (as nobody is selling them execpt the new mobo price gouging hoarders).
        sigpic

        (Insert witty quote here)

        Comment


          #5
          Re: Ratdude's bad HDD day

          Tested the other 3, same result. HDDs are either moaning about smart, or the card/mobo combo isn't happy.

          I'll see if I can find any lower cost leads on said SM board (edit- it's the H8DC8)... if not, well, I'm at a loss here. I do know my PSU came with Suscons which could be suspect?

          Edit2- The H8DC8 is indeed a rare bird. Damn. Unlike Topcat SM boards no longer grow on trees, so I'm stuck figuring out WTF went wrong with my system.
          Last edited by ratdude747; 01-04-2016, 12:04 AM.
          sigpic

          (Insert witty quote here)

          Comment


            #6
            Re: Ratdude's bad HDD day

            so use a smart util like speedfan.
            i really doubt all 4 died at once.
            and suscons are replace on sight.

            Comment


              #7
              Re: Ratdude's bad HDD day

              Originally posted by kc8adu View Post
              so use a smart util like speedfan.
              i really doubt all 4 died at once.
              and suscons are replace on sight.
              Ding ding ding... The 5VSB is hosed. Pics coming... given that such a HW RAID card with BBU would probably load down the 5VSB a bit, no wonder it took a shit.

              I was going to recap it when I bought it... and never got around to it. Now it's kicking my ass. I don't care how pretty their ladies are, screw you SuScon!
              sigpic

              (Insert witty quote here)

              Comment


                #8
                Re: Ratdude's bad HDD day

                well thats good news.
                so seasonic is using bottom of the barrel shit caps now?

                Comment


                  #9
                  Re: Ratdude's bad HDD day

                  Originally posted by kc8adu View Post
                  well thats good news.
                  so seasonic is using bottom of the barrel shit caps now?
                  This one at least. I and somebody else on the forum bought one of these new for $40 on ebay. After "re-configuring" the SATA strands a bit and tying up the Molex/berg strands, it was the perfect layout for the system, so it will get recapped. My system uses about 550W of power (between the twin Geforce GTS 250's, twin Opteron X2's, and other trimmings), so 600W is as low as I dare go. It put out a decent amount of heat, but not enough to scream "overloaded" so the 600W rating is probably accurate.

                  IIRC he recapped his already. I even had a list together... and I probably posted it in the V3.0 thread.

                  The only other "cheap out" I found was they omitted the mains in plug to the main PCB; they instead soldered the wires straight into the footprint. Oddly, the EMI board was hand soldered to the AC plug and I/O switch directly after they were installed in the case. Kinda soldered in crooked a bit, lots of flux and junk not fully washed off. Not exactly junk, but this was a tad entry level (although it does have the twin rails promised AFAIK and the build quality is otherwise fine).

                  Anyway, time for pictures:

                  5VSB:



                  The rest of the secondary looks to be OK but will get recapped anyway:



                  Bigtroll will probably be sad to see the suscons go... but this trash needs to GO!
                  Attached Files
                  Last edited by ratdude747; 01-04-2016, 01:39 AM. Reason: reduced JPEG quality
                  sigpic

                  (Insert witty quote here)

                  Comment


                    #10
                    Re: Ratdude's bad HDD day

                    SMART data are usually genuine. Could you post your SMART reports?

                    BTW, the overvoltage "protection" WD HDDs is extremely BAD. I seriously recommend that you solder links in place of resistors R64 and R67, or flow blobs of solder over them. This will enable the TVS diodes to do their job and protect your data in the event of a PSU failure.

                    Catastrophic failures in Western Digital PCBs:
                    http://www.hddoracle.com/viewtopic.p...&t=1119&p=5033

                    BTW, smartmontools (Linux) or GSmartControl (Windows) can see the SMART data of individual drives behind 3Ware RAID controllers.
                    Last edited by fzabkar; 01-04-2016, 02:13 AM.

                    Comment


                      #11
                      Re: Ratdude's bad HDD day

                      Ordered the caps... sadly a bunch of the values are ones Topcat doesn't have in stock, so I (begrudgenly) had to order from bleeping digikey. That's OK, I have a couple of boards to recap (one was damaged in storage, the other is a HP board that may actually be worth a crap) that I'll BCN source the caps on .
                      sigpic

                      (Insert witty quote here)

                      Comment


                        #12
                        Re: Ratdude's bad HDD day

                        Originally posted by fzabkar View Post
                        SMART data are usually genuine. Could you post your SMART reports?

                        BTW, the overvoltage "protection" WD HDDs is extremely BAD. I seriously recommend that you solder links in place of resistors R64 and R67, or flow blobs of solder over them. This will enable the TVS diodes to do their job and protect your data in the event of a PSU failure.

                        Catastrophic failures in Western Digital PCBs:
                        http://www.hddoracle.com/viewtopic.p...&t=1119&p=5033
                        I don't have any as my USB 3.0 SATA adapter doesn't support SMART. If I keep having issues I'll look into hooking the drives to something that will give me SMART. For now I'll recap the PSU and go from there.
                        sigpic

                        (Insert witty quote here)

                        Comment


                          #13
                          Re: Ratdude's bad HDD day

                          i have a seasonic SS-301HT here:
                          bastard is full of OST and JPce-tur

                          here's a preliminary caplist if it helps anyone.

                          EC302 - 68/25 5mm ost RLS 250ma 0.30r
                          EC303 - 4.7/50 5mm ost RLS 238ma 0.34r

                          EC100 - 22/50 5mm ost RLS 238ma 0.34r
                          EC101 - 47/25 5mm ost RLS 250ma 0.30r
                          EC300 - 180/400 22x40mm hitachi HP3 1390ma

                          EC201 - 220/16 6.3mm JPce-tur 0501
                          EC402 - 220/16 6.3mm JPce-tur 0501
                          EC200 - 2200/10 10mm ost RLS 2150ma 0.022r

                          EC504 - 2200/10 10x20mm ost RLP 1220ma 0.042r
                          EC??? - 3300/16 12.5x30mm ost RLS 3290ma 0.016r
                          EC??? - 1000/16 10x16mm ost RLS 1430ma 0.038r

                          EC??? - 3300/10 12.5x25mm ost RLS 2770ma 0.018r
                          EC??? - 3300/? 12.5x20mm ost RLS
                          EC602 - 2200/6.3 10x20mm ost RLP 1220ma 0.041r

                          sub - 22/35 5mm ost

                          Comment


                            #14
                            Re: Ratdude's bad HDD day

                            That's why I don't use raid any more, I don't care if you have a bbu, stupid shit like this is going to happen and your raid controller is not going to be able to fix it. Which is why I have moved to ZFS, so even if my HBA sas cards go bad, I can simply swap them out and power up and have minimal to non-existent data loss. I don't lose X amount of data because the raid was not able to resilver a platter or fix a U.R.E or lose the whole array due to a hardware failure.
                            It doesn't stop data corruption due to faulty ram or bad power supply, but it will tell you when there is and what files/directories are affected.

                            https://en.wikipedia.org/wiki/ZFS

                            Seriously, start using it.

                            Comment


                              #15
                              Re: Ratdude's bad HDD day

                              Originally posted by kc8adu View Post
                              so seasonic is using bottom of the barrel shit caps now?
                              No, this is an old group regulated design.
                              I do note that 2010 date on the 5vsb transformer though.
                              So maybe they use questionable caps for OEM units or something?
                              In any case Seasonic is nowdays one of the only (the only?) PSU manufacturer that uses reliable capacitors.
                              Did you ever post any more photots of this unit RD? I tried to search but could only find some posts where you mentioned it.
                              What model is it? Because honestly for 600w it looks a bit puny?

                              Originally posted by ratdude747 View Post
                              Bigtroll will probably be sad to see the suscons go... but this trash needs to GO!
                              No, but he will be glad to see that he trolled you so successfully
                              Last edited by Per Hansson; 01-04-2016, 04:07 AM.
                              "The one who says it cannot be done should never interrupt the one who is doing it."

                              Comment


                                #16
                                Re: Ratdude's bad HDD day

                                I'll find some links for you... I thought I took pics of it.

                                As for the RAID concerns, I'm not 100% sure my array is cooked. I'll see once I have the PSU recapped... even if it was, I didn't have anything critical on it that WASN'T backed up. Yeah I'd have to reinstall 7 and all that, but for the most part it should be a painless install.

                                edit- Found the thread and post with the original ebay link:

                                Originally posted by ratdude747 View Post
                                I may have found a PSU:

                                http://www.ebay.com/itm/Seasonic-SS-...item4853ef61be

                                From the review of the consumer version 's 430W variant, it's actually a single rail PSU and a decently built one.

                                I'm tempted to get it...

                                Is $42 shipped a fair price for one of these... or should I keep looking?

                                Well, I guess it WAS single rail and used, not new. Go figure. My memory isn't the best apparently...

                                Pentium4 was the other buyer...
                                Last edited by ratdude747; 01-04-2016, 01:43 PM.
                                sigpic

                                (Insert witty quote here)

                                Comment


                                  #17
                                  Re: Ratdude's bad HDD day

                                  Found pictures:

                                  https://www.badcaps.net/forum/showpo...1&postcount=73

                                  Funny I suspected the PSU as being flakey then... I later found out that the issue was a bad recap job (replaced a cap I shouldn't have, which in turn blew due to now being overvolted) and also the AMD Hypertransport to AGP chip having really bad driver support; ir only works on windows XP x32 and older, no 64 bit, linux, or Vista+ support... on a groundbreaking x64 platform... WHY AMD, WHYY??? Long story short, this is why V3 became V3.5 (and it's honestly a lot more powerful as a result... twin SLI'd GPUs in a workstation, that's just awesome IMHO).
                                  Last edited by ratdude747; 01-04-2016, 01:54 PM.
                                  sigpic

                                  (Insert witty quote here)

                                  Comment


                                    #18
                                    Re: Ratdude's bad HDD day

                                    Originally posted by ratdude747 View Post
                                    I do know my PSU came with Suscons which could be suspect?
                                    You should've known better than to have run it with those!
                                    Lucky for you (you say) there wasn't anything important on the drives, or kill them.

                                    So you were in there before and left those shit caps?! You like "Deer season" then? Those caps'll get ya even w/o a Deer...

                                    What do you mean "replaced a cap you shouldn't have?" If they're all junk, replace all.

                                    Chipset issues can be aggravated with lousy power too.

                                    Originally posted by stj
                                    i have a seasonic SS-301HT here:
                                    bastard is full of OST and JPce-tur
                                    SuperShit-301-HitTeam!

                                    JPce-turd... is this 2006?
                                    Last edited by kaboom; 01-04-2016, 06:30 PM.
                                    "pokemon go... to hell!"

                                    EOL it...
                                    Originally posted by shango066
                                    All style and no substance.
                                    Originally posted by smashstuff30
                                    guilty,guilty,guilty,guilty!
                                    guilty of being cheap-made!

                                    Comment


                                      #19
                                      Re: Ratdude's bad HDD day

                                      Originally posted by kaboom View Post
                                      You should've known better than to have run it with those!
                                      Lucky for you (you say) there wasn't anything important on the drives, or kill them.

                                      So you were in there before and left those shit caps?! You like "Deer season" then? Those caps'll get ya even w/o a Deer...
                                      They were caps I didn't have in stock... and since the system was high use I really didn't want to mess with it back then. I since forgot about the issue... which was obviously a blunder.

                                      Originally posted by kaboom View Post
                                      What do you mean "replaced a cap you shouldn't have?" If they're all junk, replace all.
                                      The board had a mix of various UCC caps. Most were KZJ's, which needed to go. However, some were of a good series in 25V. They looked identical to some 6.3V KZJ's and one of each were next to each other between the AGP slot and one of the PCI-X slots. Hence why I accidentally replaced a 25V non-KZJ cap with a 6.3V replacement... and I was too dumb to realize there was a reason why I was one cap short (and had to pull from my normal stock). Obviously the 25V was on a 12V rail... and the 6.3V cap that was in it's place didn't like that too well. Whoops!

                                      Originally posted by kaboom View Post

                                      Chipset issues can be aggravated with lousy power too.
                                      The AMD AGP chip issue was a documented issue... AMD flat out said those drivers don't exist for such platforms in a document out there. To this day I am baffiled why they neutered such an otherwise awesome platform... Sure servers wouldn't care about such an issue... but workstations back then NEEDED AGP in order to do what they do.

                                      Luckily Nvidia wasn't so stupid and made it so pretty much every OS (x64 and x86) could run thier nForce Professional 2000 series chips, which is what the rig currently uses. Not to mention it was the first SLI solution to run twin x16 (none of the twin 8x BS that plagued thier socket 939 solutions). Now days that's taken for granted... but back then this really was a groundbreaking platform (and why I love running it).
                                      sigpic

                                      (Insert witty quote here)

                                      Comment


                                        #20
                                        Re: Ratdude's bad HDD day

                                        you know what my views are on building big workstations with used parts. Workstations aren't supposed to be cheap for a reason.
                                        Cap Datasheet Depot: http://www.paullinebarger.net/DS/
                                        ^If you have datasheets not listed PM me

                                        Comment

                                        Working...
                                        X