Computer troubles...

**Curious.George** · 02-11-2018, 12:39 PM

Re: Computer troubles...

Originally posted by tom66

Sometimes, the lazy approach is taken, which can be to sum values together, which is vulnerable to many types of error and really only detects single bit errors reliably. A correctly designed CRC will pick up most types of bit substitution and flip errors. An even smarter system would use something like Reed-Solomon, allowing errors to be identified and corrected.

You really have to consider how the errors will, likely, be introduced.

Do you want to detect if the memory device has been removed? Failed completely? Do bits "age"? If so, do they age-to-0 or age-to-1? Are there failure mechanisms inside the device that cause groups of bits to fail (e.g., a shared sense amplifier)?

Optical media, for example, assumes there will be scratches that will take out MANY "bits" -- but, that they will be related IN SPACE (cuz scratches are "contiguous events" across the surface)

The other problem is that someone may just inherit a checksum algorithm in a body of code and not consider how the hardware may have changed/evolved in the time since the algorithm was first designed/selected. So, the failure modes that it was designed to protect against might no longer be valid and, in fact, the algorithm may be ill-advised for the current hardware implementation!

I was tasked with making some changes to a product that relied on nonvolatile memory (BBSRAM) to hold accounting data (i.e., money). The twit who had designed the system assumed that storing the data in triplicate would buy him reliability. In theory, it would let him detect and correct any single bit error in a replicated datum!

E.g., if the three copies of bit #29 of a datum appear as (1,1,1), then you can probably assume it represents a '1'. Likewise, (0,0,0) to represent a '0'. The sets (0,0,1), (0,1,0) and (1,0,0) all suggest a '0' datum that has degraded and should be corrected to (0,0,0). Likewise, (1,1,0), (1,0,1) and (0,1,1) suggest a '1' datum that has been degraded and should be corrected to (1,1,1).

Fine, with 3 bits, the Hamming distance will only allow for a single bit detect/correct.

But, when you treat larger data (e.g., long words) as composite entities in your check algorithm INSTEAD OF GROUPS OF BITS, you lose all the benefits of this redundancy.

E.g., if you have the nybbles '9', '9' and '1' (1001, 1001, 0001), you can note that 1 != 9 but 9 == 9 so the '1' should be corrected to a '9' -- a single bit was in error.

OTOH, if you have '9', '8' and '1', you might say 9 != 8, 8 != 1, 9 != 1 and, therefore, you have no way to recover (no two nybbles are the same!). If, instead, you treat this as groups of 4 bits and notice that you really have two single bit errors (1001, 1000, 0001), then you can correct both of them. But, not if you treat the nybbles as the raw data! I.e., the twit had naively treated the nybbles as the data and could thus only correct a small subset of errors.

**RJARRRPCGP** · 02-11-2018, 02:45 PM

Re: Computer troubles...

Originally posted by tom66

An even smarter system would use something like Reed-Solomon, allowing errors to be identified and corrected.

IIRC, the even smarter one is, SHA!

**eccerr0r** · 02-11-2018, 04:21 PM

Re: Computer troubles...

SHA is better for detection of errors; R-S ECC is for detection of a far smaller number of errors and correction of even fewer.

The check algorithm has a lot of design considerations that need to be accommodated - do keep in mind simple checksum will still flag many errors despite its problems with the benefit of its sole advantage: simplicity.

Same could be said for parity which is even worse for detection than checksum, yet it's still used because it's simple. However with the large data sets and malicious modification now a threat, these simple methods may no longer be as helpful as they were in the past.

**Curious.George** · 02-11-2018, 05:30 PM

Re: Computer troubles...

Originally posted by RJARRRPCGP

IIRC, the even smarter one is, SHA!

Most of the larger hashes are only good for verifying that the data is intact. So, good for verifying the integrity of a BIOS program image (before or after installation). But, lousy for letting you "fix" variable data. I.e., they can detect large numbers of errors (without collisions) but can't correct ANY!

So, protecting "parameters" with overly complex hashes doesn't buy you anything (they also tend to yield large/wide "checksums" that take up more space than they can justify; not an issue when they are stored IN the program image that they are authenticating!).

First line of defense is always to avoid the "unintended alteration" to start with! I.e., if your power sequencing hardware is faulty and allows spurious/aborted writes to the memory, then fix that before throwing effort into detecting and correcting the errors that IT introduces. Likewise, if your code can create bogus values, then fix the code so that it can't!

When you're sure you have valid data going into your store, THEN figure out how to detect (and optionally correct) alterations.

[What happens when your battery ages? When the memory device ages??]

Design your recovery strategy based on the types of errors that you will have to handle and how they are likely to be handled (e.g., an unattended device probably can't emit a prompt requesting the user to set the current date so it should have a way of setting the date on its own -- even if it is a deliberately bogus date (like 13/32/1901) that would be recognizable to someone examining its presence in any logs, etc.

[And, the next time you find yourself needing to ask for the current date, scan the log to see what a likely / bogus date COULD be!]

**stj** · 02-12-2018, 12:30 AM

Re: Computer troubles...

crc32 would be enough, we arent going for perfection.

**tom66** · 02-12-2018, 02:10 AM

Originally posted by RJARRRPCGP

IIRC, the even smarter one is, SHA!

No you wouldn't use SHA for error correction, it's a hashing algorithm, and a BIOS config doesn't need that level of cryptographic security.

Originally posted by stj

crc32 would be enough, we arent going for perfection.

Yes, a good 32-bit CRC will detect the vast majority of errors, although it will not allow for correction, this is often sufficient for something like a CMOS data set.

**Curious.George** · 02-12-2018, 08:52 AM

Re: Computer troubles...

Originally posted by tom66

Yes, a good 32-bit CRC will detect the vast majority of errors, although it will not allow for correction, this is often sufficient for something like a CMOS data set.

Historically, the sorts of "check" algorithms employed have been all over the map, in terms of complexity and durability.

Sun uses a simple longitudinal parity to "protect" the IDPROM in its machines -- but, there are only 15 bytes of data involved (the parity byte being the 15th). Amusingly, even this simple check would be sufficient to catch the most common data corruption modes (i.e., a failure of the integrated battery in the BBSRAM)

By contrast, the balance of the NVRAM (including the potential code stored in nvramrc) was protected by another hash.

You could patch (some) early PC BIOSes freely by simply "compensating" for every desired changed value with a complementary change in some other "unused" value. E.g., if you want to change a byte from its present value of 0x23 to 0x34 (a net change of +0x11) then simply change some other byte from it's current value of 0x86 to 0x75 (a net change of -0x11); you don't even have to know where the checksum is stored as the "net" sum will be unchanged!

But, nowadays, with modular BIOSes, more aggressive hashes are used (including explicit "encryption" of the BIOS) as it can conceivably be modified in the wild.

**TechWizard.Support** · 02-18-2018, 05:10 PM

Re: Computer troubles...

Glad you got this sorted!

As a side note, when it restarts like that after a few seconds, it can sometimes be due to automatic overclocking. Did you have anything like that configured? It does seem odd that it just happened out of the blue, but as others have mentioned, maybe the CMOS battery is on its way out?

**tom66** · 02-27-2018, 09:59 AM

Re: Computer troubles...

Originally posted by TechWizard.Support

Glad you got this sorted!

As a side note, when it restarts like that after a few seconds, it can sometimes be due to automatic overclocking. Did you have anything like that configured? It does seem odd that it just happened out of the blue, but as others have mentioned, maybe the CMOS battery is on its way out?

No automatic overclocking, CMOS battery seems fine. PC is keeping time and booting fine ever since.

**brethin** · 02-27-2018, 10:06 AM

Re: Computer troubles...

I have seen this on Gigabyte boards before, esp the dual bios ones.

**stj** · 02-27-2018, 12:34 PM

Re: Computer troubles...

maybe if the dual bioses have different revisions in them?

**ChaosLegionnaire** · 02-28-2018, 01:50 AM

Re: Computer troubles...

yes, i remember reading something about this from gigabyte. they say the main bios chip is flashable but the backup bios chip is read-only and cannot be written by the user, so that could be it.

**brethin** · 02-28-2018, 02:15 AM

Re: Computer troubles...

it's basically a safety so you can't brick the board flashing like Ratdude tends to do sometimes..��

**stj** · 02-28-2018, 03:37 AM

Re: Computer troubles...

no, it's because some joker wrote a virus that tried (pretty well) to issue a flash-erase sequence.

**ratdude747** · 03-02-2018, 10:00 AM

Re: Computer troubles...

Originally posted by brethin

it's basically a safety so you can't brick the board flashing like Ratdude tends to do sometimes..��

I wasn't part of this thread?

**Topcat** · 03-02-2018, 10:16 AM

Re: Computer troubles...

Originally posted by ratdude747

I wasn't part of this thread?

If it makes you feel better, I bricked a supermicro C7Q67 a few weeks ago. I went to do a routine updating on a board that the 'tree' gave me. It went through the motions of updating, never post'd again.....its now an I5 paper weight. SM support actually let me down on this one, basically told me to go pound sand....apparently this board is known for flash failure incidents...there was a thread here about it!

https://www.badcaps.net/forum/showthread.php?t=36190

**Per Hansson** · 03-02-2018, 01:33 PM

Re: Computer troubles...

Such paper weights can be mailed to me

**Topcat** · 03-02-2018, 06:22 PM

Re: Computer troubles...

Originally posted by Per Hansson

Such paper weights can be mailed to me

Cover the shipping, its yours.

**pfrcom** · 03-02-2018, 06:55 PM

Re: Computer troubles...

Originally posted by Topcat

If it makes you feel better, I bricked a supermicro C7Q67 a few weeks ago.

Before flashing C7Q67s it's critical to observe Read_This_Before_Flash.txt from within BIOS Zip

Unfortunately (NotSo)SuperMicro seem to have included this warning only in BIOS 2.1a and later

Essentially one must disable Management Engine before doing BIOS update

If not, same outcome as doing BIOS update from command prompt within Windows in the old days

**stj** · 03-02-2018, 07:46 PM

Re: Computer troubles...

so bricked by the spyware!
something else to blame intel for!

Computer troubles...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Related Topics