Badcaps.net Forum
Go Back   Badcaps Forums > General Topics > General Computer Discussion
Register FAQ Calendar Search Today's Posts Mark Forums Read

 
Thread Tools Display Modes
Old 12-26-2018, 07:40 PM   #1
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default "Solid State" media failures

Not wanting to limit this to actual "SSDs" but, rather, "non-rotating storage media" (including FLASH "soldered down")...

What's been your experience with "flash" failures -- typically in tablets, phones (or even SSDs)? And, with the exception of SSDs (which can typically be replaced as an FRU), what recourse have you had to address said failure(s)?
Curious.George is offline   Reply With Quote
Old 12-26-2018, 09:11 PM   #2
Topcat
The Boss Stooge
 
Topcat's Avatar
 
Join Date: Oct 2003
City & State: Salem, MO
My Country: United States
Line Voltage: 120VAC 60Hz
I'm a: Professional Tech
Posts: 12,038
Default Re: "Solid State" media failures

had my first SSD die on me last week.....it was an OCZ Trion 150, 240gb. I've never had a tablet SSD fail, but I'm not a tablet junkie.....I still prefer a laptop.
__________________
<--- Badcaps.net Founder & Owner

Badcaps.net Services:

Premade Capacitor Kits
Badcaps.net Capacitor Master List


Motherboard Repair Services


If you've come here in search of replacement capacitors or repair services, please use the links above.
----------------------------------------------
Badcaps.net Forum Members Folding Team
http://folding.stanford.edu/
Team : 49813
Join in!!
Team Stats
Topcat is online now   Reply With Quote
Old 12-26-2018, 11:59 PM   #3
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by Topcat View Post
had my first SSD die on me last week.....it was an OCZ Trion 150, 240gb. I've never had a tablet SSD fail, but I'm not a tablet junkie.....I still prefer a laptop.
By "die", did it simply stop working altogether? Or, did it's FTL prove incapable of coping with write wear-through? I.e., spinning rust should, theoretically, degrade gracefully as the grown defect table gets bigger (eventually impacting the capacity of the volume). The FTL should provide similar functionality for solid state media with the caveat that the entire medium will, eventually, fail.
Curious.George is offline   Reply With Quote
Old 12-27-2018, 11:10 AM   #4
Topcat
The Boss Stooge
 
Topcat's Avatar
 
Join Date: Oct 2003
City & State: Salem, MO
My Country: United States
Line Voltage: 120VAC 60Hz
I'm a: Professional Tech
Posts: 12,038
Default Re: "Solid State" media failures

I'm not sure what caused its death...I switched it from one interface to another (ICH7 to an ICH10), and it never worked again. BIOS sees it, but it can't be read from or written to. Tried several utilities to gain access to its data, no go. I shut it off and pronounced it dead when DBAN said the time remaining to wipe it was 640 hours....
Topcat is online now   Reply With Quote
Old 12-27-2018, 11:24 AM   #5
stj
Great Sage 齊天大聖
 
stj's Avatar
 
Join Date: Dec 2009
City & State: Europe
My Country: some shithole run by Israeli agents
I'm a: Professional Tech
Posts: 20,120
Default Re: "Solid State" media failures

obviously you tried it on the original machine?
maybe a damaged sata connector.
stj is offline   Reply With Quote
Old 12-27-2018, 11:43 AM   #6
Topcat
The Boss Stooge
 
Topcat's Avatar
 
Join Date: Oct 2003
City & State: Salem, MO
My Country: United States
Line Voltage: 120VAC 60Hz
I'm a: Professional Tech
Posts: 12,038
Default Re: "Solid State" media failures

Quote:
Originally Posted by stj View Post
obviously you tried it on the original machine?
maybe a damaged sata connector.
Yup, dead in that machine too. Connector/cabling is fine.
Topcat is online now   Reply With Quote
Old 12-27-2018, 11:48 AM   #7
stj
Great Sage 齊天大聖
 
stj's Avatar
 
Join Date: Dec 2009
City & State: Europe
My Country: some shithole run by Israeli agents
I'm a: Professional Tech
Posts: 20,120
Default Re: "Solid State" media failures

fine on the drive pcb?

reason i say this, sata is a fast serial system - too damned fast.
data is sent in small packets and corrupted ones are re-sent.
the theory is that if the errors arent too high the transfer rates are still impressive.

if you get a bad signal from a poor cable or connector the data still makes it but it's much slower - potentially so slow that more than a handfull of bytes is going to timeout.
stj is offline   Reply With Quote
Old 12-27-2018, 12:22 PM   #8
tom66
EVs Rule
 
tom66's Avatar
 
Join Date: Apr 2011
City & State: Leeds
My Country: UK
Line Voltage: 230Vac 50Hz
I'm a: Professional Tech
Posts: 32,360
Default Re: "Solid State" media failures

SSDs do fail, I've had friends with Kingston drives go bad. None myself, had the odd SD card and USB drive fail though. Personally I do not trust any SSD not from a tier-one manufacturer (Samsung, Sandisk, Intel and *maybe* Toshiba), manufacturers like Kingston are frequently swapping the parts used on their SSDs so performance is inconsistent and lifespan is never guaranteed. See, for instance, the Kingston V300 debacle [1].

Newer SSDs are moving to 8-level or 16-level flash, so the per-cell density is getting really high, but the trouble is, all flash memory is damaged by erase operations. And as the cells are written more often, their leakage increases, so they hold data less reliably over time.

This is one reason flash memory is terrible for archival purposes. If it is a high-density SSD, don't expect it to retain data without power for more than 10 years or so. A powered SSD is a happy SSD, because the controller can remap the drive periodically, when under little load.

One other significant factor is ambient temperature. When I was employed at a large set-top box manufacturer I tested a number of flash memory devices in STBs at temperature. The unit that was running at 40C had about half the cycle life of a unit running at 25C. So, if you can position your SSD so it runs cooler, that's better. One reason I really don't like M.2 drives is because they are located so close to the hot CPU, compared to a SATA drive. Higher temperatures at high cycle counts are the worst-case for flash memory.

[1] https://www.anandtech.com/show/7763/...er-micron-nand

Last edited by tom66; 12-27-2018 at 12:25 PM..
tom66 is offline   Reply With Quote
Old 12-27-2018, 01:35 PM   #9
stj
Great Sage 齊天大聖
 
stj's Avatar
 
Join Date: Dec 2009
City & State: Europe
My Country: some shithole run by Israeli agents
I'm a: Professional Tech
Posts: 20,120
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
If it is a high-density SSD, don't expect it to retain data without power for more than 10 years or so.
more like 10 months for some of the stacked-cell chips - i'v seen the datasheets!
stj is offline   Reply With Quote
Old 12-27-2018, 02:55 PM   #10
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by Topcat View Post
I'm not sure what caused its death...I switched it from one interface to another (ICH7 to an ICH10), and it never worked again. BIOS sees it, but it can't be read from or written to. Tried several utilities to gain access to its data, no go. I shut it off and pronounced it dead when DBAN said the time remaining to wipe it was 640 hours....
But the failure was coincident with your actions? ESD? Board flexing breaking some (brittle) ROHS solder joints?

[Coincidences always leave me suspicious...]
Curious.George is offline   Reply With Quote
Old 12-27-2018, 03:02 PM   #11
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
SSDs do fail, I've had friends with Kingston drives go bad. None myself, had the odd SD card and USB drive fail though. Personally I do not trust any SSD not from a tier-one manufacturer (Samsung, Sandisk, Intel and *maybe* Toshiba), manufacturers like Kingston are frequently swapping the parts used on their SSDs so performance is inconsistent and lifespan is never guaranteed. See, for instance, the Kingston V300 debacle.
What I'm interested in is whether the failures are typically catastrophic (e.g., TC's, upthread) or a gradual degradation in performance (longer erase/program times, more bad blocks to recover/restore, etc.)

Quote:
Newer SSDs are moving to 8-level or 16-level flash, so the per-cell density is getting really high, but the trouble is, all flash memory is damaged by erase operations. And as the cells are written more often, their leakage increases, so they hold data less reliably over time.
And even read perturb events become more of a problem -- which leads to more erase/program cycles (to salvage the block(s) in which data is degrading but the block is not actually "failed"/unusable).

Quote:
This is one reason flash memory is terrible for archival purposes. If it is a high-density SSD, don't expect it to retain data without power for more than 10 years or so. A powered SSD is a happy SSD, because the controller can remap the drive periodically, when under little load.
It's worth noting that the 10 years figure has been applied to almost all new media. Yet, time has proven otherwise. E.g., I still have ancient 8" floppies that I can read, 9T tape, and hundreds of off-the-shelf writable CD/DVD media.

OTOH, I have colleagues who complain that they can't read a CD that they wrote a few months earlier (PEBKaC).

An SSD can typically be removed/replaced leaving you with a usable piece of kit (sans SSD). OTOH, tablets, phones and other appliances usually have their solid state memory "soldered down". So, a failure in the media OR a failure in the FTL can result in bricking the device with no hope of salvage.
Curious.George is offline   Reply With Quote
Old 12-27-2018, 03:22 PM   #12
tom66
EVs Rule
 
tom66's Avatar
 
Join Date: Apr 2011
City & State: Leeds
My Country: UK
Line Voltage: 230Vac 50Hz
I'm a: Professional Tech
Posts: 32,360
Default Re: "Solid State" media failures

Quote:
Originally Posted by Curious.George View Post
What I'm interested in is whether the failures are typically catastrophic (e.g., TC's, upthread) or a gradual degradation in performance (longer erase/program times, more bad blocks to recover/restore, etc.)
In my experience, it's typically catastrophic because the failures begin at similar times across the drive.

Once a sector fails on an SSD, the drive will spend a long time attempting to recover it. This will lead to read latency climbing significantly and random sector failure will also likely cause issues with filesystems.

With our STBs, when a failure occurred in the onboard eMMC, the Linux kernel spent about 20 minutes spewing out messages on dmesg/serial terminal before I had a usable terminal. And it effectively became unusuable because each sector read would be rejected after a 10sec delay from the drive controller.

Maybe it's possible to configure the kernel to behave more gracefully when this goes bad but AFAIK there is no way for the kernel to know the drive is bad - it just takes forever to read from...
tom66 is offline   Reply With Quote
Old 12-27-2018, 04:32 PM   #13
TechGeek
Computer Geek
 
TechGeek's Avatar
 
Join Date: Jan 2015
City & State: Texas
My Country: USA
Line Voltage: 122.5VAC@60hZ/200A
I'm a: Hardcore Geek
Posts: 1,412
Default Re: "Solid State" media failures

More reasons to not trust SSDs with critical data.
__________________
Don't buy those $10 PSU "specials". They fail, and they have taken whole computers with them.

For computer parts, go to Newegg
OR
Amazon.

For electrical stuff(pushbuttons, capacitors, etc), use Digikey
OR
Mouser.

Windows 10!? Oh yeah, that sorry excuse of an operating system that spies on you 24/7/365!

Samsung = Seagate = Seatrash = Trashgate
Don't buy Seagate drives. Don't use Seagate drives. If you have any in service right now, make plans to replace them ASAP.

TVs repaired: Toshiba 32AV500U
Stereo Receivers brought back to life: 1(Fisher RS-1056)
Stereos/Stereo receivers to recap: Fisher RS-1056


TechGeek is offline   Reply With Quote
Old 12-27-2018, 04:54 PM   #14
tom66
EVs Rule
 
tom66's Avatar
 
Join Date: Apr 2011
City & State: Leeds
My Country: UK
Line Voltage: 230Vac 50Hz
I'm a: Professional Tech
Posts: 32,360
Default Re: "Solid State" media failures

*
Quote:
Originally Posted by TechGeek View Post
More reasons to not trust SSDs with critical data.
Another reason not to have critical data on any single medium *at all*. Backups, folks! Backups! Three backups, two different locations and at least one different type of medium. But if you're too lazy to do that, then at least use an online service e.g. BackBlaze.
tom66 is offline   Reply With Quote
Old 12-28-2018, 01:51 AM   #15
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
In my experience, it's typically catastrophic because the failures begin at similar times across the drive.
So, you're effectively saying that the wear leveling is "practically ideal" and everything "wears out" at the same time...?

Quote:
Once a sector fails on an SSD, the drive will spend a long time attempting to recover it. This will lead to read latency climbing significantly and random sector failure will also likely cause issues with filesystems.
But, now you're conflating the drive's failure with the application's expectations of it. I.e., an application that doesn't hammer on the drive wouldn't suffer as horrendous a fate.

Environments that load apps "once" from persistent store could stumble along with the user only noticing a startup delay when the app is initially loaded.

Quote:
With our STBs, when a failure occurred in the onboard eMMC, the Linux kernel spent about 20 minutes spewing out messages on dmesg/serial terminal before I had a usable terminal. And it effectively became unusuable because each sector read would be rejected after a 10sec delay from the drive controller.
I'd assume you would tune the driver to not wait as long for a retry, knowing the nature of the drive that it was talking to (i.e., don't use a driver tuned for use with "traditional media")

Quote:
Maybe it's possible to configure the kernel to behave more gracefully when this goes bad but AFAIK there is no way for the kernel to know the drive is bad - it just takes forever to read from...
I'm assuming (?) tablets and other devices with soldered down memory implement their own FTL and, as such, can (chose to) see more of what's happening inside the medium. By contrast, an SSD has a conventional interface that it tries to maintain that deliberately hides lots of these "medium specific" details.

Of course, the flip side of this (if its indeed how these devices are designed) is that you're at the mercy of N different FTL implementations, each of which embody considerable BFM. :<
Curious.George is offline   Reply With Quote
Old 12-28-2018, 01:58 AM   #16
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
Backups, folks! Backups! Three backups, two different locations and at least one different type of medium. But if you're too lazy to do that, then at least use an online service e.g. BackBlaze.
A lot of that depends on how long you expect to consider your data as "valuable". My archive goes back more than 40 years. A good bit of that stuff I really wouldn't cry about if it disappeared. But, the effort to sort out what REMAINS important to me, today, far exceeds the cost/effort to preserve it!

On-line services require a high speed connection to move copies of archives around. And/or support for more advanced protocols (e.g., rsync) to verify their integrity against local copies (and vice versa). For example, it takes a fair bit of time to copy a TB image over Gbe; imagine doing that with many TB!

And, being a Cynic, I'm not sure I'd consider any of them "secure", given the number of break-ins/hacks and outright SALES of data that we hear about.
Curious.George is offline   Reply With Quote
Old 12-28-2018, 12:36 PM   #17
tom66
EVs Rule
 
tom66's Avatar
 
Join Date: Apr 2011
City & State: Leeds
My Country: UK
Line Voltage: 230Vac 50Hz
I'm a: Professional Tech
Posts: 32,360
Default Re: "Solid State" media failures

Quote:
Originally Posted by Curious.George View Post
So, you're effectively saying that the wear leveling is "practically ideal" and everything "wears out" at the same time...?
No... some areas will wear out sooner, but the failures will be random in nature. So it is reasonably likely that several sectors will fail in a small span of time if the drive is used in a typical fashion (and the wear levelling works well)

Quote:
Originally Posted by Curious.George View Post
But, now you're conflating the drive's failure with the application's expectations of it. I.e., an application that doesn't hammer on the drive wouldn't suffer as horrendous a fate.

Environments that load apps "once" from persistent store could stumble along with the user only noticing a startup delay when the app is initially loaded.
True - to a point - but there is no current way for SATA SSDs to indicate that they are becoming a bit "latent" and that you need to wait to read some sections. So the kernel (Windows, Linux, whatever) will keep hitting sectors and if it gets held up somewhere, the result will be random performance degradation with the user not being aware of the trigger.

With our eMMC flash on our STBs the fault essentially was that the eMMC wouldn't mount correctly, but the application software didn't like this, so attempted to re-mount it frequently. Each mount attempt took far too long as it relied on a timeout, leading to the unit slowing down considerably.

Quote:
Originally Posted by Curious.George View Post
I'm assuming (?) tablets and other devices with soldered down memory implement their own FTL and, as such, can (chose to) see more of what's happening inside the medium. By contrast, an SSD has a conventional interface that it tries to maintain that deliberately hides lots of these "medium specific" details.
You assume so but it is not correct. Many smartphones and smart devices still use eMMC flash wth the same defect we experienced.
tom66 is offline   Reply With Quote
Old 12-28-2018, 12:41 PM   #18
tom66
EVs Rule
 
tom66's Avatar
 
Join Date: Apr 2011
City & State: Leeds
My Country: UK
Line Voltage: 230Vac 50Hz
I'm a: Professional Tech
Posts: 32,360
Default Re: "Solid State" media failures

Quote:
Originally Posted by Curious.George View Post
A lot of that depends on how long you expect to consider your data as "valuable". My archive goes back more than 40 years. A good bit of that stuff I really wouldn't cry about if it disappeared. But, the effort to sort out what REMAINS important to me, today, far exceeds the cost/effort to preserve it!

On-line services require a high speed connection to move copies of archives around. And/or support for more advanced protocols (e.g., rsync) to verify their integrity against local copies (and vice versa). For example, it takes a fair bit of time to copy a TB image over Gbe; imagine doing that with many TB!
So my solution was to back it all up using my 20Mbit (upstream) connection which took a while! About two months all in with it running in the background. But it did work.

I don't keep the stuff on the cloud. It's only used as a backup method, the data is only there in case a failure occurs.

Quote:
Originally Posted by Curious.George View Post
And, being a Cynic, I'm not sure I'd consider any of them "secure", given the number of break-ins/hacks and outright SALES of data that we hear about.
I use BackBlaze myself with an encryption key. The encryption key is partially written down on a piece of paper, which is stored in a fire safe in my detached garage and a second copy is stored in another secret location. The key is in two parts with the first part being a secret that I have remembered (just a memorable word or something like that), and the second part is written on that paper.

Without that key the data is useless, it is encrypted on my PC and if I have a drive failure they will ship me a HDD with the encrypted data on it, which I can then recover using that key.

If you are so concerned about data security you can trust AES256, it *will not* be broken with current technology and is likely to remain secure for at least the next 20 years.
tom66 is offline   Reply With Quote
Old 12-28-2018, 03:03 PM   #19
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
True - to a point - but there is no current way for SATA SSDs to indicate that they are becoming a bit "latent" and that you need to wait to read some sections. So the kernel (Windows, Linux, whatever) will keep hitting sectors and if it gets held up somewhere, the result will be random performance degradation with the user not being aware of the trigger.
Well, if the driver was smarter, it would be able to note the time required to service individual requests and compare these to historical norms. I do this in userland when I'm accessing the volumes in my archive. It pays off in spades for optical media where retries are costly (in terms of time) and where "SMART" data isn't really available.

Without hacking the driver, it gives me similar diagnostics that I have available in other devices I've designed (that just use NAND/NOR directly). There, I watch the device's actual performance against it's "specified" worst case performance to detect potential failures (before they become "double failures" and, thus, less detectable -- the second failure masking the first).

While I don't rely on it as a predictor of drive failure, I use it to modify the schedule for "file verification" so that the other files on the physical volume are revisited sooner in case there IS a problem brewing.

Quote:
With our eMMC flash on our STBs the fault essentially was that the eMMC wouldn't mount correctly, but the application software didn't like this, so attempted to re-mount it frequently. Each mount attempt took far too long as it relied on a timeout, leading to the unit slowing down considerably.
Sounds like a case of "generic" software applied to a very specific technology. An "impedance mismatch", of sorts.

Quote:
Many smartphones and smart devices still use eMMC flash wth the same defect we experienced.
I'd have assumed economies of scale made it cost effective to deal with their own FLASH management (instead of paying a vendor to do so). OTOH, at large scales, nearly everything becomes free so this may have been an easy bone to toss out.

In my case (10K's), I'd rather the cost savings AND the enhanced insight to the components' operation as I can't just swap out a drive -- nor do I have a network of retail establishments (phone vendors) that can provide replacement devices on my behalf.
Curious.George is offline   Reply With Quote
Old 12-28-2018, 03:17 PM   #20
Curious.George
Badcaps Veteran
 
Join Date: Nov 2011
Posts: 1,123
Default Re: "Solid State" media failures

Quote:
Originally Posted by tom66 View Post
So my solution was to back it all up using my 20Mbit (upstream) connection which took a while! About two months all in with it running in the background. But it did work.
My archive is in excess of 100T. I can access it at ~100MB drive rates (i.e., Gbe) so I don't think twice about verifying "a copy"s integrity or pulling down a few GB of data in case I might want to use it (discarding it if I opt not to).

Quote:
I use BackBlaze myself with an encryption key. The encryption key is partially written down on a piece of paper, which is stored in a fire safe in my detached garage and a second copy is stored in another secret location. The key is in two parts with the first part being a secret that I have remembered (just a memorable word or something like that), and the second part is written on that paper.

Without that key the data is useless, it is encrypted on my PC and if I have a drive failure they will ship me a HDD with the encrypted data on it, which I can then recover using that key.

If you are so concerned about data security you can trust AES256, it *will not* be broken with current technology and is likely to remain secure for at least the next 20 years.
So, in case of a disaster on your end, and you want to retrieve your backup, you do so through the same 20Mb pipe, over the course of another few months? Hoping, all the while, that the company maintaining it hasn't changed their terms of service (or gone belly up or been hacked offline)?

I'm old enough that if something MAJOR happened (house explosion), I'd only fret over the loss of RECENT financial and medical records. And, those are periodically updated on portable media to handle the MORE likely scenario of having to evacuate (fire, flood, terror incident, etc.).

Yeah, I'd miss my music archives, book library, technical library, software archive, project history logs, etc.. But, I'd also miss the various bits of equipment that I'd lost -- many of which being irreplaceable and/or essential to make use of the data (apps, source code) that was "lost". Recovering the archive from an offsite store would take months, anyway (getting a machine set up again that could access them -- and make use of them! -- and having the encryption key on my person when I abandoned the office!). Not likely to be the most pressing need I'd have! So, it's just as easy to treat them as disposable, at that point and start over.

[Finances and medical, however, have no convenient "reset" and their "need" can prove to be "immediate"!]
Curious.George is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump



Badcaps.net Technical Forums 2003 - 2019
Powered by vBulletin ®
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
All times are GMT -6. The time now is 11:05 PM.
Did you find this forum helpful?