Badcaps Forums

Badcaps Forums (https://www.badcaps.net/forum/index.php)
-   General Computer & Tech Discussion (https://www.badcaps.net/forum/forumdisplay.php?f=16)
-   -   RAID5 failure: 2 bad HDD's at the same time (https://www.badcaps.net/forum/showthread.php?t=77455)

CapLeaker 07-03-2019 12:56 PM

RAID5 failure: 2 bad HDD's at the same time
 
Well, I guess I ran out of luck and shit hit the fan all right at home, ugh! :nutkick:
I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? :facepalm: Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2? :crying:

Question is: How do I recover all the files or recover the RAID?

Curious.George 07-03-2019 02:03 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 903872)
Well, I guess I ran out of luck and shit hit the fan all right at home, ugh! :nutkick:
I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? :facepalm: Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2? :crying:

This is a known problem with RAID -- esp with larger drives! The time it takes to rebuild the array represents a sizeable window in which a second failure can eat your lunch...

Of course, the "cost" (window of vulnerability) of rebuilding the failed drive will vary (e.g., RAID5 being more expensive than RAID1).

eccerr0r 07-03-2019 02:55 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
As always RAID is not a backup.
Question is, how bad are the drives. If you pull them up on their own on a PC (DON'T WRITE TO THEM!) can you at least read a few bytes? SMART information?

If you have two drives completely dead, you're probably SOL. If just one is dead and one has a few bad sectors, depending on your NAS firmware you may be able to recover something...unfortunately I don't have any experience with WD's RAID, just Linux mdraid...

ChaosLegionnaire 07-03-2019 04:04 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
thats why i dont buy nas boxes off the shelf. they are typically composed of a homogeneous set of drives so this means that the drives have a tendency to all fail at the same time! talk about very convenient planned obsolescence there! im sure the companies that make these nas boxes couldnt care less either if it means more buying and more money!

therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.

do what eccerror said. for me, i fire up linux, pull the smart data, see how many pending, uncorrectable and reallocated sectors there are and run gnu ddrescue to pull as much data off the bad drive as possible if its still acessible and not bricked in which case the drive is totally unaccessible and undetectable neither by the bios nor os.

if the drive is bricked and the data is critical, send it to a data recovery company. the fee could cost thousands of dollars for the recovery.

eccerr0r 07-03-2019 04:59 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
I've been able to successfully reassemble my Linux md-RAID5 arrays that were destroyed by two disk failures, but there's no guarantee that the data I pull off is accurate. However I was able to get a good portion of the data off after the failure.

Which reminds me, I need to backup my array again soon...

CapLeaker 07-03-2019 05:41 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Well, drive 2 is FUBAR. Won't read from it period. Not sure why Drive 3 has some bad sectors and I was able to get the important stuff off the Raid5. So that is good. However I am not able to recover the whole Raid array. But that is o.k. I kept too much junk anyway.

Curious.George 07-03-2019 09:37 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by ChaosLegionnaire (Post 903901)
therefore, i prefer to diy my own nas and thus pick drives with different platter density technologies and different number of heads etc. so they would fail at different times instead.

They still have many things in common: the hardware/software that's implementing the array, power supply, thermal experience, software that is accessing the array, etc.

I prefer to trade robustness for convenience -- I only spin up a drive when I'm accessing its contents. If that content is munged, then I have to consider how much of the other content may be at risk. Or, if the box that I'm using to access that drive may, instead, be the culprit.

[Software/firmware/clients/apps/PEBKAC have been known to be buggy]

As I don't expect to encounter problems, when/if I do, it gives me a moment to think about what's happening before I propagate a failure (to other copies of the data).

Uranium-235 07-03-2019 11:03 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
this is why for large arrays, raid 6 is a better idea

diif 07-03-2019 11:24 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
This is why RAID IS NOT BACKUP.

Stefan Payne 07-04-2019 11:53 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 903872)
Well, I guess I ran out of luck and shit hit the fan all right at home, ugh! :nutkick:
I am running a WD PR4100 16TB NAS. All was good until I noticed some slow file transfer occasionally. Done a HDD test and drive 2 failed. O.K. no problem, order new drive, replace drive and rebuild RAID5. Easy, right? :facepalm: Well not so fast. Wouldn't you suppose, the drive 3 failed, during the half way mark of rebuilding the RAID5 array on drive 2? :crying:

You _NEVER EVER EVER_ do that!
If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

Also RAID is NOT a replacement for the BACKUP!

So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....

CapLeaker 07-05-2019 06:36 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by Stefan Payne (Post 904040)
You _NEVER EVER EVER_ do that!
If a Drive in a RAID Array fails, you build a new one and copy the content from the old to the new one as long as it works. Start with the most important things.

Also RAID is NOT a replacement for the BACKUP!

So all you can do right now is to clone the drives and hope you have everything you need, then rebuild the RAID with the new drives....

Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array? I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?

No, I've lost nothing important and that is a good thing. I do have a few offline HDDs. Some of the stuff on the RAID array was so old, it gives me a chance to clean up my file storage. Rather than copying everything and deleting the stuff no longer wanted, I just revesed it by copying only the stuff I want. This gives me more space.

CapLeaker 07-05-2019 06:40 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by Uranium-235 (Post 903935)
this is why for large arrays, raid 6 is a better idea

that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?

Curious.George 07-05-2019 09:42 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 904233)
that is what I am aiming for, something where 2 drives can fail. Anyone tried the SHR2 from Synology?

Note that you don't need a second "disk failure" -- a URE (during the rebuild) will effectively render a RAID5 (w/ failed disk) "broken". Make sure your NAS is doing patrol reads of the entire array lest you discover that URE when you can least afford it!

Stefan Payne 07-07-2019 01:44 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 904232)
Interesting... So you are saying to clone the bad HDD's in the RAID 5 array with clonezilla to a new drive and put it back into the array?

Its worth a try.
You might want to clone the other HDDs as well or move them immediately over to a new RAID Array.

Quote:

Originally Posted by CapLeaker (Post 904232)
I thought the array knows the HDD by serial number or something, so it would detect it as a "new" drive?

No, that should be written in the MBR or wherever it does that.



Anyway, rule of the thumb:
If one Drive in a RAID Array dies, do not rebuild it, backup your data and move it over to another Array!

Because when all are the same make/model, other drives failing is highly likely.

CapLeaker 07-08-2019 08:06 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Cloning the HDD with Clonzilla, didn't work for me.

Curious.George 07-09-2019 06:03 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 904582)
Cloning the HDD with Clonzilla, didn't work for me.

Without knowing how (and WHERE!) the particular NAS stores the array configuration data on the drive, there's no way of knowing if CZ will even SEE it as "data". CZ cheats by only copying the portions of the drive that it KNOWS to contain data (i.e., by understanding file systems and other common disk structures). This lets it skip over the parts of the medium that it thinks are "empty" -- otherwise CZ would take as long as a bytewise copy operation.

(Watch CZ in action and you will see how the thruput changes over the course of the operation)

You may have to resort to a bytewise copy to be sure you are preserving all of the "stuff that matters" -- to your NAS!

And, you're still stuck with the highly likely URE interfering with that operation -- the U in URE -- without the benefit of the redundant drives to compensate for it.

16TB = 128,000,000,000,000 bits = 1.28 x 10^14. Assume a URE rate of 1 in 10^14...

CapLeaker 07-12-2019 07:12 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS. :D

Curious.George 07-13-2019 12:34 PM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 905219)
that's why i thought it's not possible. I have to wait for some drives. Prime day is coming and I need a shit load of HDD's and a new NAS. :D

dd(1) should clone the drive completely (there may be some issues with portions of the MBR under some OS's).

Of course, now you're faced with the time it takes to read the entire medium.

And, the real possibility that dd(1) will encounter a URE somewhere along the way (you'll have to sort out what "value" should be substituted for the "unknown" value, in that case).

ISTR CZ has an option to just fall into dd(1) mode (instead of trying to understand the filesystem's structure)...?

CapLeaker 07-14-2019 08:25 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.

Curious.George 07-14-2019 10:43 AM

Re: RAID5 failure: 2 bad HDD's at the same time
 
Quote:

Originally Posted by CapLeaker (Post 905395)
I can clone it with dd or Clonezilla no problem, but my NAS sees it as a new HDD.

If it is truly cloning the entire media surface, then the NAS must have some NVRAM in which it stores data from drive inquiry commands. E.g., I track drives in my "disk sanitizer" by storing the serial number, model number, etc. from the drive inquiry in a large database. So, when I next encounter the drive (e.g., when I install an OS image), I know its history.

Usually, the drive is used to store this stuff (in a special partition or in the "unused" area right after the MBR).

Regardless, this is one of the ways RAID f*cks you; had that been a "regular" disk, you could have thrown it in another machine and accessed its contents like normal (losing whatever part of the disk that may be afflicted with UREs).

If you've already written off the data (as lost), you could try to recover the contents using one of the Windows/Linux tools that claim to be able to do so. At the very least, it will be a learning experience (and COULD yield positive results).

Google "raid recovery" (and, please, report on any results!)


All times are GMT -6. The time now is 03:06 AM.

Powered by vBulletin ®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.