Announcement

Collapse
No announcement yet.

Computer troubles...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tom66
    replied
    Originally posted by RJARRRPCGP View Post
    IIRC, the even smarter one is, SHA!
    No you wouldn't use SHA for error correction, it's a hashing algorithm, and a BIOS config doesn't need that level of cryptographic security.

    Originally posted by stj View Post
    crc32 would be enough, we arent going for perfection.
    Yes, a good 32-bit CRC will detect the vast majority of errors, although it will not allow for correction, this is often sufficient for something like a CMOS data set.

    Leave a comment:


  • stj
    replied
    Re: Computer troubles...

    crc32 would be enough, we arent going for perfection.

    Leave a comment:


  • Curious.George
    replied
    Re: Computer troubles...

    Originally posted by RJARRRPCGP View Post
    IIRC, the even smarter one is, SHA!
    Most of the larger hashes are only good for verifying that the data is intact. So, good for verifying the integrity of a BIOS program image (before or after installation). But, lousy for letting you "fix" variable data. I.e., they can detect large numbers of errors (without collisions) but can't correct ANY!

    So, protecting "parameters" with overly complex hashes doesn't buy you anything (they also tend to yield large/wide "checksums" that take up more space than they can justify; not an issue when they are stored IN the program image that they are authenticating!).

    First line of defense is always to avoid the "unintended alteration" to start with! I.e., if your power sequencing hardware is faulty and allows spurious/aborted writes to the memory, then fix that before throwing effort into detecting and correcting the errors that IT introduces. Likewise, if your code can create bogus values, then fix the code so that it can't!

    When you're sure you have valid data going into your store, THEN figure out how to detect (and optionally correct) alterations.

    [What happens when your battery ages? When the memory device ages??]

    Design your recovery strategy based on the types of errors that you will have to handle and how they are likely to be handled (e.g., an unattended device probably can't emit a prompt requesting the user to set the current date so it should have a way of setting the date on its own -- even if it is a deliberately bogus date (like 13/32/1901) that would be recognizable to someone examining its presence in any logs, etc.

    [And, the next time you find yourself needing to ask for the current date, scan the log to see what a likely / bogus date COULD be!]

    Leave a comment:


  • eccerr0r
    replied
    Re: Computer troubles...

    SHA is better for detection of errors; R-S ECC is for detection of a far smaller number of errors and correction of even fewer.

    The check algorithm has a lot of design considerations that need to be accommodated - do keep in mind simple checksum will still flag many errors despite its problems with the benefit of its sole advantage: simplicity.

    Same could be said for parity which is even worse for detection than checksum, yet it's still used because it's simple. However with the large data sets and malicious modification now a threat, these simple methods may no longer be as helpful as they were in the past.

    Leave a comment:


  • RJARRRPCGP
    replied
    Re: Computer troubles...

    Originally posted by tom66 View Post
    An even smarter system would use something like Reed-Solomon, allowing errors to be identified and corrected.
    IIRC, the even smarter one is, SHA!
    Last edited by RJARRRPCGP; 02-11-2018, 02:47 PM.

    Leave a comment:


  • Curious.George
    replied
    Re: Computer troubles...

    Originally posted by tom66 View Post
    Sometimes, the lazy approach is taken, which can be to sum values together, which is vulnerable to many types of error and really only detects single bit errors reliably. A correctly designed CRC will pick up most types of bit substitution and flip errors. An even smarter system would use something like Reed-Solomon, allowing errors to be identified and corrected.
    You really have to consider how the errors will, likely, be introduced.

    Do you want to detect if the memory device has been removed? Failed completely? Do bits "age"? If so, do they age-to-0 or age-to-1? Are there failure mechanisms inside the device that cause groups of bits to fail (e.g., a shared sense amplifier)?

    Optical media, for example, assumes there will be scratches that will take out MANY "bits" -- but, that they will be related IN SPACE (cuz scratches are "contiguous events" across the surface)

    The other problem is that someone may just inherit a checksum algorithm in a body of code and not consider how the hardware may have changed/evolved in the time since the algorithm was first designed/selected. So, the failure modes that it was designed to protect against might no longer be valid and, in fact, the algorithm may be ill-advised for the current hardware implementation!

    I was tasked with making some changes to a product that relied on nonvolatile memory (BBSRAM) to hold accounting data (i.e., money). The twit who had designed the system assumed that storing the data in triplicate would buy him reliability. In theory, it would let him detect and correct any single bit error in a replicated datum!

    E.g., if the three copies of bit #29 of a datum appear as (1,1,1), then you can probably assume it represents a '1'. Likewise, (0,0,0) to represent a '0'. The sets (0,0,1), (0,1,0) and (1,0,0) all suggest a '0' datum that has degraded and should be corrected to (0,0,0). Likewise, (1,1,0), (1,0,1) and (0,1,1) suggest a '1' datum that has been degraded and should be corrected to (1,1,1).

    Fine, with 3 bits, the Hamming distance will only allow for a single bit detect/correct.

    But, when you treat larger data (e.g., long words) as composite entities in your check algorithm INSTEAD OF GROUPS OF BITS, you lose all the benefits of this redundancy.

    E.g., if you have the nybbles '9', '9' and '1' (1001, 1001, 0001), you can note that 1 != 9 but 9 == 9 so the '1' should be corrected to a '9' -- a single bit was in error.

    OTOH, if you have '9', '8' and '1', you might say 9 != 8, 8 != 1, 9 != 1 and, therefore, you have no way to recover (no two nybbles are the same!). If, instead, you treat this as groups of 4 bits and notice that you really have two single bit errors (1001, 1000, 0001), then you can correct both of them. But, not if you treat the nybbles as the raw data! I.e., the twit had naively treated the nybbles as the data and could thus only correct a small subset of errors.
    Last edited by Curious.George; 02-11-2018, 12:41 PM.

    Leave a comment:


  • tom66
    replied
    Re: Computer troubles...

    Originally posted by Curious.George View Post
    You choose/design the polynomial to detect the types of errors you expect to encounter.
    Indeed, if you are a competent engineer!

    Sometimes, the lazy approach is taken, which can be to sum values together, which is vulnerable to many types of error and really only detects single bit errors reliably. A correctly designed CRC will pick up most types of bit substitution and flip errors. An even smarter system would use something like Reed-Solomon, allowing errors to be identified and corrected.

    Who knows what the manufacturer of the BIOS or motherboard did in this instance?

    Leave a comment:


  • Th3_uN1Qu3
    replied
    Re: Computer troubles...

    Glad it was this easy and you are back up and running. Be aware that on some motherboards, setting the clear CMOS jumper to clear with power on will corrupt the BIOS, requiring a reflash.

    Another trick to keep in mind for the dual-BIOS boards is that if you short any two data lines of the main BIOS chip while powering on the system, the motherboard should automatically read from the backup BIOS.

    Leave a comment:


  • Curious.George
    replied
    Re: Computer troubles...

    Originally posted by tom66 View Post
    The CMOS checksum fail also depends on how such a checksum is calculated. Simple checksums like sum-of-all-values can fail to detect many types of error, for instance substituting 0x00 with 0xff will fool such a checksum.
    You choose/design the polynomial to detect the types of errors you expect to encounter.

    All checksums/hashes reduce information -- you're trying to "summarize" some (large?) number of bytes of data with just a few bytes (e.g., two, for a 16 bit checksum). That opens the door for any number of "collisions" in the dataspace -- different combinations of data bytes that yield the same checksum/hash.

    E.g., a simple "check digit calculation" will yield the same result for the sets of data (1,2,3,4,5}, {2,3,4,5,1}, {3,4,5,1,2}, etc.

    So, if you expect your valid "1,2,3,4,5" to be corrupted to "4,3,5,2,1", then that sort of approach doesn't buy you much!

    Note that naively storing multiple copies of the data also buys you very little (esp given the overhead involved).

    And, a single checksum/hash for data having varying degrees of importance is also foolish. You probably care differently about the "asset identifier" stored in the BIOS NVRAM than you do about the "boot order" or IP address of the LoM system! So, why fold "less important" parameters into the calculation of a hash for MORE important parameters, thereby increasing the odds that the hash will fail (on POST) rendering those more important parameters as "suspect"?

    Leave a comment:


  • tom66
    replied
    Re: Computer troubles...

    The CMOS checksum fail also depends on how such a checksum is calculated. Simple checksums like sum-of-all-values can fail to detect many types of error, for instance substituting 0x00 with 0xff will fool such a checksum.

    Leave a comment:


  • tom66
    replied
    Re: Computer troubles...

    Originally posted by hasefroch View Post
    You may want to double check for a weak CMOS battery; sometimes being low in voltage may change some setting on the fly and produce unexpected or strange problems.
    The most notable that remember was one that "decided" to boot without sending video to any output on any graphics card, but without errors (in fact, via teamviewer was possible to use the pc). The other got a sudden lost of the mouse and refused to get detected in at least two windows versions and a couple of Linux distros; I think even in DOS wasn't detected.
    Both returned to normal just unplugging the power and disconnecting the battery, so as a rule of thumb with computers with random problems one test that I do is take out the CMOS battery to see if changes something.

    Hope this helps.
    Oddly, the system kept the time post-reset (so I don't think the battery was low), but I'll keep an eye on it, thanks.

    Leave a comment:


  • hasefroch
    replied
    Re: Computer troubles...

    You may want to double check for a weak CMOS battery; sometimes being low in voltage may change some setting on the fly and produce unexpected or strange problems.
    The most notable that remember was one that "decided" to boot without sending video to any output on any graphics card, but without errors (in fact, via teamviewer was possible to use the pc). The other got a sudden lost of the mouse and refused to get detected in at least two windows versions and a couple of Linux distros; I think even in DOS wasn't detected.
    Both returned to normal just unplugging the power and disconnecting the battery, so as a rule of thumb with computers with random problems one test that I do is take out the CMOS battery to see if changes something.

    Hope this helps.

    Leave a comment:


  • RJARRRPCGP
    replied
    Re: Computer troubles...

    Originally posted by stj View Post
    that's shitty coding, the bios should have checksum'd the cmos before trusting it!
    It wasn't the BIOS, it was the CMOS...

    Leave a comment:


  • Curious.George
    replied
    Re: Computer troubles...

    Originally posted by stj View Post
    that's shitty coding, the bios should have checksum'd the cmos before trusting it!
    A checksum isn't the cure. All that (theoretically) does is ensure the data hasn't been "corrupted" outside of the control of the BIOS.

    If, OTOH, the BIOS crams a bogus value somewhere and then factors this into its checksum calculation, there is nothing "wrong" with the resulting checksum... it will continue to vouch for the "bogus value". Every time the BIOS checks that checksum!

    You need, instead, sanity checks on individual values (and groups of values) that can affect the operation of the machine in ways that could lead to a crash, failure to boot/POST, etc.

    And, ideally, to choose data representations that make "bogus" values harder to create (or easier to recognize). For example, an "elapsed time" datum need never represent a negative value. So, choosing a representational form that supports negative values just means you've now got a whole range of bogus values that you have to explicitly check for ("Is the value less than zero? If so, it's bogus...")

    Leave a comment:


  • stj
    replied
    Re: Computer troubles...

    that's shitty coding, the bios should have checksum'd the cmos before trusting it!

    Leave a comment:


  • tom66
    replied
    Re: Computer troubles...

    Fixed it. Was very simple.

    I simply jumpered the CLR_CMOS jumper on the motherboard while the system was powered. This erases the configuration of the system. The system immediately rebooted and worked; I just had to reconfigure my SATA drives to get Windows to boot.

    I tried the backup flash method (which involves holding the power button while power cycling using the switch on the PSU) but that did not help so it was not a corrupted BIOS flash, just corrupted configuration.

    Really odd though that the configuration would go bad after some time, but happy in the end that it was an easy fix. Also seems silly that the system had no way of recognising it was in a boot loop (e.g. five failed boot attempts in a row, load default config).

    Leave a comment:


  • RJARRRPCGP
    replied
    Re: Computer troubles...

    Originally posted by tom66 View Post
    No, it only powers up for about 3 seconds and during that time there's no video. There's barely enough time for the CPU fan to start spinning.
    Also, check the socket and the traces to the socket!

    Leave a comment:


  • ChaosLegionnaire
    replied
    Re: Computer troubles...

    Originally posted by tom66 View Post
    Have done the obvious, like running with only one stick of RAM.
    try running without any ram at all. the board should beep and complain of no ram inserted. could be the ram gone bad in this case. what ram is it anyway? if its corsair ram, quite likely it may have gone bad. i recently had one of my p4 pcs go bad due to the ram. cause was found to be a bad psu killing the crucial/micron ram.

    Leave a comment:


  • gabik111
    replied
    Re: Computer troubles...

    Check VGA cable (if have there). I had similar problem on mine. Turned out it was bad video cable (why it did power cycle I can't explain - simply I have no idea, I know it is weird issue).

    Leave a comment:


  • chanavs
    replied
    Re: Computer troubles...

    try to clean ram slot thoroughly with isopropyle and reseat ram

    Leave a comment:

Working...
X