Announcement

Collapse
No announcement yet.

RAID5 failure: 2 bad HDD's at the same time

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    RAID5 failure: 2 bad HDD's at the same time

    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?

    Question is: How do I recover all the files or recover the RAID?

    #2
    Re: RAID5 failure: 2 bad HDD's at the same time

    Originally posted by CapLeaker View Post
    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?
    This is a known problem with RAID -- esp with larger drives! The time it takes to rebuild the array represents a sizeable window in which a second failure can eat your lunch...

    Of course, the "cost" (window of vulnerability) of rebuilding the failed drive will vary (e.g., RAID5 being more expensive than RAID1).
    Last edited by Per Hansson; 07-10-2019, 09:14 AM. Reason: fixed quote

    Comment


      #3
      Re: RAID5 failure: 2 bad HDD's at the same time

      As always RAID is not a backup.
      Question is, how bad are the drives. If you pull them up on their own on a PC (DON'T WRITE TO THEM!) can you at least read a few bytes? SMART information?

      If you have two drives completely dead, you're probably SOL. If just one is dead and one has a few bad sectors, depending on your NAS firmware you may be able to recover something...unfortunately I don't have any experience with WD's RAID, just Linux mdraid...

      Comment


        #4
        Re: RAID5 failure: 2 bad HDD's at the same time

        thats why i dont buy nas boxes off the shelf. they are typically composed of a homogeneous set of drives so this means that the drives have a tendency to all fail at the same time! talk about very convenient planned obsolescence there! im sure the companies that make these nas boxes couldnt care less either if it means more buying and more money!

        therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.

        do what eccerror said. for me, i fire up linux, pull the smart data, see how many pending, uncorrectable and reallocated sectors there are and run gnu ddrescue to pull as much data off the bad drive as possible if its still acessible and not bricked in which case the drive is totally unaccessible and undetectable neither by the bios nor os.

        if the drive is bricked and the data is critical, send it to a data recovery company. the fee could cost thousands of dollars for the recovery.

        Comment


          #5
          Re: RAID5 failure: 2 bad HDD's at the same time

          I've been able to successfully reassemble my Linux md-RAID5 arrays that were destroyed by two disk failures, but there's no guarantee that the data I pull off is accurate. However I was able to get a good portion of the data off after the failure.

          Which reminds me, I need to backup my array again soon...

          Comment


            #6
            Re: RAID5 failure: 2 bad HDD's at the same time

            Well, drive 2 is FUBAR. Won't read from it period. Not sure why Drive 3 has some bad sectors and I was able to get the important stuff off the Raid5. So that is good. However I am not able to recover the whole Raid array. But that is o.k. I kept too much junk anyway.

            Comment


              #7
              Re: RAID5 failure: 2 bad HDD's at the same time

              Originally posted by ChaosLegionnaire View Post
              therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.
              They still have many things in common: the hardware/software that's implementing the array, power supply, thermal experience, software that is accessing the array, etc.

              I prefer to trade robustness for convenience -- I only spin up a drive when I'm accessing its contents. If that content is munged, then I have to consider how much of the other content may be at risk. Or, if the box that I'm using to access that drive may, instead, be the culprit.

              [Software/firmware/clients/apps/PEBKAC have been known to be buggy]

              As I don't expect to encounter problems, when/if I do, it gives me a moment to think about what's happening before I propagate a failure (to other copies of the data).

              Comment


                #8
                Re: RAID5 failure: 2 bad HDD's at the same time

                this is why for large arrays, raid 6 is a better idea
                Cap Datasheet Depot: http://www.paullinebarger.net/DS/
                ^If you have datasheets not listed PM me

                Comment


                  #9
                  Re: RAID5 failure: 2 bad HDD's at the same time

                  This is why RAID IS NOT BACKUP.

                  Comment


                    #10
                    Re: RAID5 failure: 2 bad HDD's at the same time

                    Originally posted by CapLeaker View Post
                    Well, I guess I ran out of luck and shit hit the fan all right at home, ugh!
                    I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2?
                    You _NEVER EVER EVER_ do that!
                    If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

                    Also RAID is NOT a replacement for the BACKUP!

                    So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....

                    Comment


                      #11
                      Re: RAID5 failure: 2 bad HDD's at the same time

                      Originally posted by Stefan Payne View Post
                      You _NEVER EVER EVER_ do that!
                      If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

                      Also RAID is NOT a replacement for the BACKUP!

                      So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....
                      Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array? I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?

                      No, I've lost nothing important and that is a good thing. I do have a few offline HDDs. Some of the stuff on the RAID array was so old, it gives me a chance to clean up my file storage. Rather than copying everything and deleting the stuff no longer wanted, I just revesed it by copying only the stuff I want. This gives me more space.

                      Comment


                        #12
                        Re: RAID5 failure: 2 bad HDD's at the same time

                        Originally posted by Uranium-235 View Post
                        this is why for large arrays, raid 6 is a better idea
                        that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?

                        Comment


                          #13
                          Re: RAID5 failure: 2 bad HDD's at the same time

                          Originally posted by CapLeaker View Post
                          that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?
                          Note that you don't need a second "disk failure" -- a URE (during the rebuild) will effectively render a RAID5 (w/ failed disk) "broken". Make sure your NAS is doing patrol reads of the entire array lest you discover that URE when you can least afford it!

                          Comment


                            #14
                            Re: RAID5 failure: 2 bad HDD's at the same time

                            Originally posted by CapLeaker View Post
                            Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array?
                            Its worth a try.
                            You might want to clone the other HDDs as well or move them immediately over to a new RAID Array.

                            Originally posted by CapLeaker View Post
                            I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?
                            No, that should be written in the MBR or wherever it does that.



                            Anyway, rule of the thumb:
                            If one Drive in a RAID Array dies, do not rebuild it, backup your data and move it over to another Array!

                            Because when all are the same make/model, other drives failing is highly likely.

                            Comment


                              #15
                              Re: RAID5 failure: 2 bad HDD's at the same time

                              Cloning the HDD with Clonzilla, didn't work for me.

                              Comment


                                #16
                                Re: RAID5 failure: 2 bad HDD's at the same time

                                Originally posted by CapLeaker View Post
                                Cloning the HDD with Clonzilla, didn't work for me.
                                Without knowing how (and WHERE!) the particular NAS stores the array configuration data on the drive, there's no way of knowing if CZ will even SEE it as "data". CZ cheats by only copying the portions of the drive that it KNOWS to contain data (i.e., by understanding file systems and other common disk structures). This lets it skip over the parts of the medium that it thinks are "empty" -- otherwise CZ would take as long as a bytewise copy operation.

                                (Watch CZ in action and you will see how the thruput changes over the course of the operation)

                                You may have to resort to a bytewise copy to be sure you are preserving all of the "stuff that matters" -- to your NAS!

                                And, you're still stuck with the highly likely URE interfering with that operation -- the U in URE -- without the benefit of the redundant drives to compensate for it.

                                16TB = 128,000,000,000,000 bits = 1.28 x 10^14. Assume a URE rate of 1 in 10^14...

                                Comment


                                  #17
                                  Re: RAID5 failure: 2 bad HDD's at the same time

                                  that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS.

                                  Comment


                                    #18
                                    Re: RAID5 failure: 2 bad HDD's at the same time

                                    Originally posted by CapLeaker View Post
                                    that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS.
                                    dd(1) should clone the drive completely (there may be some issues with portions of the MBR under some OS's).

                                    Of course, now you're faced with the time it takes to read the entire medium.

                                    And, the real possibility that dd(1) will encounter a URE somewhere along the way (you'll have to sort out what "value" should be substituted for the "unknown" value, in that case).

                                    ISTR CZ has an option to just fall into dd(1) mode (instead of trying to understand the filesystem's structure)...?

                                    Comment


                                      #19
                                      Re: RAID5 failure: 2 bad HDD's at the same time

                                      I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.

                                      Comment


                                        #20
                                        Re: RAID5 failure: 2 bad HDD's at the same time

                                        Originally posted by CapLeaker View Post
                                        I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.
                                        If it is truly cloning the entire media surface, then the NAS must have some NVRAM in which it stores data from drive inquiry commands. E.g., I track drives in my "disk sanitizer" by storing the serial number, model number, etc. from the drive inquiry in a large database. So, when I next encounter the drive (e.g., when I install an OS image), I know its history.

                                        Usually, the drive is used to store this stuff (in a special partition or in the "unused" area right after the MBR).

                                        Regardless, this is one of the ways RAID f*cks you; had that been a "regular" disk, you could have thrown it in another machine and accessed its contents like normal (losing whatever part of the disk that may be afflicted with UREs).

                                        If you've already written off the data (as lost), you could try to recover the contents using one of the Windows/Linux tools that claim to be able to do so. At the very least, it will be a learning experience (and COULD yield positive results).

                                        Google "raid recovery" (and, please, report on any results!)
                                        Last edited by Curious.George; 07-14-2019, 10:45 AM.

                                        Comment

                                        Working...
                                        X