Announcement

Collapse
No announcement yet.

Why are EEPROMs sometimes corrupt?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Why are EEPROMs sometimes corrupt?

    Hello, I'm quite new to this forum and I see in different sections (computer, TV, computer display) that there are many dumped EEPROM, NAND, SPI memories ect...).

    So I would like to know if it is really common for this type of memory to be down or for the data it contains to be corrupted? And what can be the causes of these data corruptions within these memories?

    I ask the question more out of curiosity than out of need, but well... I like to learn new things...

    Thank you

    #2
    Re: Why are EEPROMs sometimes corrupt?

    I just realized that maybe I should have created this discussion in the "General Electronics Technical Discussion" section.
    If a moderator can move it it would be cool... Thank you

    Comment


      #3
      Re: Why are EEPROMs sometimes corrupt?

      maybe they are too close to a magnetic field,
      or the set loses power while something is being written,
      or maybe the israeli lobby offered cash?
      (like politicians)

      Comment


        #4
        Re: Why are EEPROMs sometimes corrupt?

        Originally posted by SRO2 View Post
        So I would like to know if it is really common for this type of memory to be down or for the data it contains to be corrupted? And what can be the causes of these data corruptions within these memories?
        EEPROMs suffer from limitations on the number of write cycles that can be successfully performed before the memory "physically" fails. The number of cycles that are "guaranteed" (by the EEPROM's manufacturer under specific operating conditions) is called the device's "write endurance". Note that it is often in the range of 100,000 operations but can fall to ~10,000 if operated at lower temperatures, etc.

        So, a foolish implementer who updates the EEPROM contents (or any portion of it) too often runs the risk of "burning it out" (or, some portion of it). I.e., you can deliberately exceed this limit in a few seconds if you're foolish enough!

        Most folks aren't stupid enough to do this -- intentionally. But, may have defects in their algorithms that cause unnecessary updates at a higher frequency than their original design intentions.

        But, more likely, a failure is the result of a write operation that wasn't "to spec" -- in particular, a write that is prematurely aborted (faulty software design or inopportune loss of power during the write).

        The more frequent the writes, the greater the chance of such an incident occurring.

        For example, your TV likely "remembers" the current channel, volume, mute status, input selection, etc. between power cycles. If the software developer chose to update these settings DIRECTLY IN EEPROM each time they were changed, there is a greater chance that a power glitch or other "bug" could interfere with one of those updates.

        OTOH, if a copy of the settings is updated in (volatile) RAM each time the user makes a change and that copy is only transfered to EEPROM when the user presses the "power off" button, then there are fewer opportunities for the write to be "interrupted" (cuz the software can defer actually turning the power OFF until after the write is completed).

        All of this is compounded by the fact that some EEPROM updates may involve multiple "bytes"/data. For example, if your TV supports > 99 channels and the channel number is stored in BCD ("0x99" being the largest value stored in a single byte), then storing the channel "100" would be done by storing "0x01" and "0x00" into the two-byte "channel number" memory.

        But, what if the channel number stored was "0x00", "0x99" and is now being updated to "0x01", "0x00"? Imagine the least significant digits are updated, first, followed by the most significant digits. The contents of those two memory locations would be (over time):
        A: 0x00 0x99 (old value)
        B: 0x00 0x00 (update the least significant digits)
        C: 0x01 0x00 (update the most significant digits; now new value)

        Imagine the power fails just after B has happened. The "channel number" memory now indicates channel "000" instead of "099" (old) or "100" (new). The software does a check of the "sanity" of the settings when the TV next powers up and decides that "000" is a bogus value!

        Or, worse, it DOESN'T do a sanity check but actually tries to use that bogus value!

        Comment


          #5
          Re: Why are EEPROMs sometimes corrupt?

          I think the OP was talking about BIOS type EEPROM's. Not flash type

          George gets it. But I think the temporary storage (like current channel) on a TV is different from the main bios/software eeprom chip. (of course smart TV's, the bios and tv software are different in themselves)
          Cap Datasheet Depot: http://www.paullinebarger.net/DS/
          ^If you have datasheets not listed PM me

          Comment


            #6
            Re: Why are EEPROMs sometimes corrupt?

            Originally posted by Uranium-235 View Post
            I think the OP was talking about BIOS type EEPROM's. Not flash type

            George gets it. But I think the temporary storage (like current channel) on a TV is different from the main bios/software eeprom chip. (of course smart TV's, the bios and tv software are different in themselves)
            The terms "Flash" and "EEPROM" are typically used in different applications. The actual storage mechanism used in each is the same.

            EEPROM is typically byte-addressable while Flash is usually handled in "pages".

            "Flash" is usually used to store "programs"/executables/binaries while EEPROM is used to store data.

            You can use "nonvolatile memory" (deliberately avoiding the use of either term!) to store your program/code in a variety of different ways. XIP (eXecute In Place) expects the memory device to be byte/word addressable so individual instructions can be "fetched" from it directly. This is how EPROM (one 'E') -- and PROM/ROM -- devices work(ed).

            But, as RAM has become so much cheaper and MLC's (TLC's, etc.) in Flash have increased the density of nonvolatile storage -- coupled with the fact that "programs" are infrequently updated -- has led to implementations where the "flash" is treated like a disk drive and the information contained therein is loaded into RAM when the device powers up. There are lots of advantages to this approach as you can often treat a device as operating in one of many "modes"; load the program for that mode when it is needed and then load the program for the NEXT mode when you're done with the first mode.

            (For example, actually watching TV vs. configuring the TV to be watched! Or, watching OTA TV vs. watching a video sourced from a local USB or network drive. Or, using the TV to surf the web. Or...)

            Comment


              #7
              Re: Why are EEPROMs sometimes corrupt?

              Thank you for these answers, so in short it's complicated

              In fact, I recently repaired a car dashboard (Renault Scenic 2). These dashboard have a known "disease" due to poor design.

              In fact there is a switching power supply integrated into the PCB of the dashboard and the MOSFET transistor tends to heat up enormously (until it succeeds in desoldering the components located around it).

              According to what I read during my research on the web, this EEPROM is corrupted because of power supply problems. Moreover I had ordered a repair kit which contains the transistor, capacitors, resistors and in particular an EEPROM on Ebay.

              I had initially changed everything except the EEPROM because I didn't think the failure could come from there but in fact the failure came from the EEPROM, I suppose that the internal data (vehicle number, language, equipment option, fuel type, and even vehicle mileage!) had to be corrupted and blocked the start of the dashboard.

              If I was a designer and I had to store data I would use 2 memories, one for "production" and another in backup with another component that could calculate a kind of CRC from the "production" memory and that could restore the data from the backup memory in case of error.

              I'm sure I'm not inventing anything and it's probably that kind of system that already exists in many devices.

              Comment


                #8
                Re: Why are EEPROMs sometimes corrupt?

                multiple copies of the data is common, some german cars even keep a backup in the keyfob.
                but the car industry is not interested in reliability, they want service / repair work.
                so just because they have the data does not mean they will restore it without a dealer service-tool.

                Comment


                  #9
                  Re: Why are EEPROMs sometimes corrupt?

                  Originally posted by SRO2 View Post
                  Thank you for these answers, so in short it's complicated
                  As with many things in life, there are tradeoffs involved. Squeeze a balloon at one end and the other end bulges. Squeeze it at BOTH ends and you risk POPPING it!

                  According to what I read during my research on the web, this EEPROM is corrupted because of power supply problems. Moreover I had ordered a repair kit which contains the transistor, capacitors, resistors and in particular an EEPROM on Ebay.
                  In that case, you're paying for a copy of the EEPROMs contents because you couldn't predict the future well enough to PRESERVE a copy of them before you needed that copy! The vendor is doing you the "favor" of delivering the contents IN a compatible EEPROM.

                  If power is unreliable AT THE EEPROM, then expecting the EEPROM to perform as advertised in the (EEPROM) manufacturer's datasheet would be dubious.

                  Solid state memories have traditionally had a caveat to their operation: operate outside the published design constraints and all bets are off.

                  For example, violating the timing constraints on DRAM could result in an entire ROW of data in the device being corrupted (not just the datum you were accessing). And, there's no "alarm" output that the device can use to inform you of this problem... you just discover it when you try to use that data and find it is corrupt! (assuming you can actually recognize that it is corrupt and don't just assume it is valid -- and incorrect!)

                  I had initially changed everything except the EEPROM because I didn't think the failure could come from there but in fact the failure came from the EEPROM, I suppose that the internal data (vehicle number, language, equipment option, fuel type, and even vehicle mileage!) had to be corrupted and blocked the start of the dashboard.

                  If I was a designer and I had to store data I would use 2 memories, one for "production" and another in backup with another component that could calculate a kind of CRC from the "production" memory and that could restore the data from the backup memory in case of error.
                  Before you can design a recovery strategy, you have to understand the failure modes likely for your implementation. For example, if you design a checksum/hash algorithm that hopes to protect against single, isolated errors, then a failure that can affect an entire PAGE of data might not benefit from that mechanism.

                  Simple "parity" hopes to detect a single bit error (i.e., the number of 1's is either odd or even). But, if you have a failure mode that causes two bits to toggle, then parity will incorrectly indicate that the data is intact -- when, in fact, there are two errors present!

                  When I design devices with onboard "parameters", I try to understand how those parameters can be corrupted (hardware and software issues). Then, I decide which parameters -- and groups of parameters -- are important and design a hash that detects and/or corrects (google "Hamming distance") some number of errors in each of those "parameter blocks".

                  For example, the "channel memory" in a TV is probably more precious than the "most recent channel, input selection, volume, mute status, etc.". The reasoning here being that it is costlier (more inconvenient) for a user to reprogram (rescan) the available channels than it is for him to adjust the volume, input selection, mute status, etc. FROM NOMINAL DEFAULT VALUES RESTORED IF CORRUPTED. So, make an effort to preserve the integrity of the channel memory and have a means of informing the user when your efforts have been insufficient ("Please rescan available channels").

                  The "most recent settings" can be restored from a set of (immutable) defaults when THEY are detected as being nonsensical, in some way.

                  I'm sure I'm not inventing anything and it's probably that kind of system that already exists in many devices.
                  Surprisingly, many devices either ignore this aspect of operation
                  "For that sort of thing to happen would mean the device was broken!"
                  "Um, no, it could just mean you haven't considered all of the possible ways that it can "fail" and not truly be broken!"
                  Or, they wrap all of the data in a single checksum (so if any of it becomes corrupt, it is all treated as corrupt). Or, they implement naive mechanisms to detect corruption.

                  [I worked on a project where some "genius" thought the way to protect the data was to keep three copies of it and use a majority voting algorithm to identify the "correct" values. Not only was this grossly inefficient (imagine keeping three copies of everything in case ONE copy is corrupted) but it was also grossly inadequate!

                  If you detect a discrepancy, how do you know which is correct in light of two or more possible errors? Say you're the (phone's) owner name and phone number. You encounter these three "copies" of that information:

                  Tom Jones 555-1212
                  Tom Jones 555-1212
                  Tom Jones 555-1213

                  You'd ASSUME the third was in error.

                  But, what if you encountered:

                  Tom Joner 555-1212
                  Tom Jones 555-1212
                  Tom Jones 555-1213

                  What's the "right" answer? (note that each error is just a single bit off!)]

                  Comment


                    #10
                    Re: Why are EEPROMs sometimes corrupt?

                    You seem really well informed about everything that is software design/data storage (embedded electronics?)!

                    If it's not intrusive, is it from your job?

                    It's really cool to have people who are so well informed and passionate on this forum.

                    (Aside from that, is what I write "easily" understandable? Because I am French and I may sometimes use expressions that may be strange or that do not exist in English...)

                    Comment


                      #11
                      Re: Why are EEPROMs sometimes corrupt?

                      Originally posted by SRO2 View Post
                      You seem really well informed about everything that is software design/data storage (embedded electronics?)!
                      I design what are now called "embedded systems" -- computers that don't look like computers.

                      If it's not intrusive, is it from your job?
                      I've been at this for a LONG time (40+ years?) and have made a point of getting involved in lots of different technologies. So, I've seen a lot of possibilities and problems.

                      E.g., in the late 70's "WAROM" technology gave us roughly the same capabilities that NOR Flash does, today.

                      BUT... imagine having just a few thousand BITS to play with. And, needing a full microsecond (1000ns) to read one and a millisecond to write one (1000us) -- after spending 10ms erasing it!

                      And, needing three power supplies (5, -12, -30!) to operate them.

                      And, having to think REALLY hard about how often you wrote them because they were limited to 1,000 write-erase cycles. As well as how often you READ them (a billion read cycle limitation)

                      You had no choice but to concoct algorithms that made the devices "more usable" -- like mirroring their contents in RAM so you could read and write those "shadow copies" at will without fear of "burning out" the actual WAROM device. Having that experience behind you means these sorts of limitations on more modern devices (Flash) are no big surprise.

                      (Aside from that, is what I write "easily" understandable? Because I am French and I may sometimes use expressions that may be strange or that do not exist in English...)
                      Certainement.

                      Comment

                      Working...
                      X