10 hours lost because stupid laptop wouldn't boot! GRRR!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lookimback
    Badcaps Legend
    • Aug 2013
    • 1489
    • USA

    #1

    10 hours lost because stupid laptop wouldn't boot! GRRR!

    Woke up this morning ready to get some work done, and the laptop was frozen. Tried everything, and ultimately had to hard reset. And then I got the dreadful blank screen with the little blinking dash at the top. I was able to boot into recovery and tried various things, which led to the computer telling me the disk was full. I thought, well that's just impossible. I just installed Kali a month ago, there's no way I filled up 300Gb in a month. du and df both also said the disk usage was at 100%, but I still didn't believe it. Next step was to boot the live CD and run fsck to check the disk for errors. It took about 5 hours, and then another 3 to fix the bad blocks, and still I had disk usage at 100%. Ultimately, I ran sudo du -a / | sort -n -r | head -n 20 and found that /var/log was using 250Gb. It was mainly 3 files, syslog, messages, and user.log, at about 80Gb each. I truncated them and I'm back up and running again. I obviously didn't fix the underlying problem, but I'll figure that out over the weekend. If I would have just believed what it was telling me, I probably would have had it fixed in minutes.
    Last edited by lookimback; 10-25-2018, 02:22 AM.
    ------------signature starts here------------


  • ChaosLegionnaire
    HC Overclocker
    • Jul 2012
    • 3264
    • Singapore

    #2
    Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

    sigh log files hogging up all the disk space again! i have that issue too on windows with those dr watson application error log files! so i just emptied the log file as a blank zero byte file and set the file to be read only and no more damn log files hogging up all the disk space!

    Comment

    • lookimback
      Badcaps Legend
      • Aug 2013
      • 1489
      • USA

      #3
      Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

      Originally posted by ChaosLegionnaire
      sigh log files hogging up all the disk space again! i have that issue too on windows with those dr watson application error log files! so i just emptied the log file as a blank zero byte file and set the file to be read only and no more damn log files hogging up all the disk space!
      Not a bad idea.
      ------------signature starts here------------


      Comment

      • stj
        Great Sage 齊天大聖
        • Dec 2009
        • 31015
        • Albion

        #4
        Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

        logs are usefull,
        maybe direct them to a seperate partition, or make a script to trim or delete them when you shutdown.
        i use a script to wipe the browser cache folders like that.

        Comment

        • RJARRRPCGP
          Badcaps Legend
          • Jul 2004
          • 6304
          • USA

          #5
          Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

          The dreaded blinking cursor of doom, usually means a corrupted boot sector...
          ASRock B550 PG Velocita

          Ryzen 9 "Vermeer" 5900X

          32 GB G.Skill RipJaws V F4-3200C16D-32GVR

          Arc A770 16 GB

          eVGA Supernova G3 750W

          Western Digital Black SN850 1TB NVMe SSD

          Alienware AW3423DWF OLED




          "¡Me encanta "Me Encanta o Enlistarlo con Hilary Farr!" -Mí mismo

          "There's nothing more unattractive than a chick smoking a cigarette" -Topcat

          "Today's lesson in pissivity comes in the form of a ziplock baggie full of GPU extension brackets & hardware that for the last ~3 years have been on my bench, always in my way, getting moved around constantly....and yesterday I found myself in need of them....and the bastards are now nowhere to be found! Motherfracker!!" -Topcat

          "did I see a chair fly? I think I did! Time for popcorn!" -ratdude747

          Comment

          • goontron
            5000!
            • Dec 2011
            • 4108
            • US

            #6
            Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

            Kali must handle out of disk oddly then. Ive driven my Suse install out of disk space multiple times. All that happens is syslog refuses to start, and X11 refusing to start, dropping me to a TTY.
            Things I've fixed: anything from semis to crappy Chinese $2 radios, and now an IoT Dildo....

            "Dude, this is Wyoming, i hopped on and sent 'er. No fucking around." -- Me

            Excuse me while i do something dangerous


            You must have a sad, sad boring life if you hate on people harmlessly enjoying life with an animal costume.

            Sometimes you need to break shit to fix it.... Thats why my lawnmower doesn't have a deadman switch or engine brake anymore

            Follow the white rabbit.

            Comment

            • Curious.George
              Badcaps Legend
              • Nov 2011
              • 2305
              • Unknown

              #7
              Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

              Originally posted by stj
              logs are usefull,
              +42

              maybe direct them to a seperate partition, or make a script to trim or delete them when you shutdown.
              +1

              I have a minimalist /var as part of the root partition. Then, mount a separate partition OVER this when booting to multiuser. So, when /var fills up, I get messages about THAT filesystem being full but the root filesystem is still "workable".

              I can then, either, transition to single user (and unmount /var) to figure out where the culprit lies; or sort out the problem with /var still mounted.

              E.g., I've been building lots of "packages" from source. The scripts that do all of the work really beat on the accounting logs. So, /var/account fills up pretty easily (I only have 500MB set aside for /var). I get a message on the console complaining that /var is full, remind myself to turn off accounting (it is normally turned on in rc.d), trim those log files (sa(8)) and things keep moving along.

              Also have newsyslog(8) set up to aggressively roll over log files and compress them. (I log at a very fine level of detail and keep log files for a long time)

              Comment

              • lookimback
                Badcaps Legend
                • Aug 2013
                • 1489
                • USA

                #8
                Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                Originally posted by stj
                logs are usefull,
                maybe direct them to a seperate partition, or make a script to trim or delete them when you shutdown.
                i use a script to wipe the browser cache folders like that.
                I agree they can be useful. I'll probably make a script to delete entries more than x days old or something.

                Originally posted by goontron
                Kali must handle out of disk oddly then. Ive driven my Suse install out of disk space multiple times.
                I'm wondering if it would have eventually started. I waited about 15 minutes, then decided it wasn't working. But, even after clearing temp files, it still just hung there.

                Originally posted by RJARRRPCGP
                The dreaded blinking cursor of doom, usually means a corrupted boot sector...
                Possibly. After finding the huge log files and truncating them, I also decided to remove the old Kali kernel and updated grub, so maybe that would have fixed any boot problems. I also ran dpkg --configure -a, apt-get fix-broken install, apt-get update, apt-get upgrade, and apt-get dist-upgrade before even trying to boot again because I wasn't sure what errors the bad blocks would have left me with. It's way faster now than it had been recently. I'm guessing it was using a lot of resources to write to those huge log files.
                ------------signature starts here------------


                Comment

                • eccerr0r
                  Solder Sloth
                  • Nov 2012
                  • 8701
                  • USA

                  #9
                  Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                  logrotate; or if using journald, there are tons of options to limit log growth...

                  Me, I just watch my disk space, if I know I didn't download something big, I better not see my disk space usage percentage go up, even over time.

                  Comment

                  • Curious.George
                    Badcaps Legend
                    • Nov 2011
                    • 2305
                    • Unknown

                    #10
                    Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                    Originally posted by eccerr0r
                    Me, I just watch my disk space, if I know I didn't download something big, I better not see my disk space usage percentage go up, even over time.
                    That depends on the number and types of services that you have running on the machine (and, of course, the logging detail).

                    Most of the "appliances" on my network use syslogd(8) to record "significant events" -- that aren't actually happening ON the machine that is logging them. Just turning a box on and off again can generate several KB of logs (effectively, dmesg(8) output from the appliance).

                    Mail to root and operator autogenerated by daily(5)/weekly(5)/monthly(5) regularly swell those mbx's (presently, root's is ~40M, operator's is ~10M).

                    "Raw" accounting files can quickly grow to hundreds of MB before sa(8) runs (mine had grown to ~300MB last night after just a few hours of heavy make(1) activity).

                    cron(8) keeps a steady trickle of logging activity.

                    Each of my ~dozen UPSs uploads a log of current power/battery/load conditions to ~UPS/logs/<hostname> every 10 minutes.

                    The DHCPd, LPd, NTPd, TFTPd, HTTPd and FTPd services record actions, there -- even if those actions don't really "add content" to the box.

                    As a rule, it's wise to put portions of the hierarchy that can be "unattendedly" consumed on separate partitions so the core system is always operable (and bootable). Usually, just /home and /var will handle most of your "exposure".

                    (And quotas if you really need to lock down specific UIDs).

                    Comment

                    • eccerr0r
                      Solder Sloth
                      • Nov 2012
                      • 8701
                      • USA

                      #11
                      Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                      Well, for a server that you don't monitor, yes you can use logrotate or configure journald. But for machines meant as a desktop that you work on frequently, you do monitor its usage and not hard to say "something's not quite right" and go investigate.

                      Comment

                      • Curious.George
                        Badcaps Legend
                        • Nov 2011
                        • 2305
                        • Unknown

                        #12
                        Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                        Originally posted by eccerr0r
                        Well, for a server that you don't monitor, yes you can use logrotate or configure journald. But for machines meant as a desktop that you work on frequently, you do monitor its usage and not hard to say "something's not quite right" and go investigate.
                        But that assumes you are sitting watching it to SEE when it coughs.

                        E.g., I have been building packages (pkgsrc) on one of my desktop machines. The first step is:
                        Code:
                        # make fetch-list > foo
                        which creates a script that will, eventually, fetch the source code tarballs for the various packages.

                        This takes a fair bit of time (a few thousand packages). But, its worth the effort as I don't want to bother fetching stuff that I've already got, on-hand.

                        So, I just let the machine chug away at it -- for a day or three.

                        But, "make fetch-list" executes a gazillion commands to do its work (because it's a recursive make script). And, as I had accounting turned on, that quickly swelled the accounting log file to a few hundred megabytes. And, overfilled the /var partition (/var/account). As the accounting files rarely grow that big, they aren't pruned but once per day!

                        Had I been sitting there watching the BLANK screen (output having been redirected to "foo"), I would have noticed the error logged to console.

                        But, why would I sit around watching something that will take hours/days? Instead, I'll work on something else.

                        Because /var was a separate partition, when it was overfull, it affected logging and other things that rely on /var -- but, not the "make fetch-list". Had /var been on the root partition, the box would probably have panicked/crashed.

                        This same sort of "problem" will manifest when I eventually run that "foo" script to pull down the various, missing tarballs. (but, I will have made note of its impact on accounting and taken steps to ensure I wasn't bitten, again!)

                        Comment

                        • lookimback
                          Badcaps Legend
                          • Aug 2013
                          • 1489
                          • USA

                          #13
                          Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                          I think I might have figured it out. I was trying to figure out why an eMMC wasn't being detected and ran sudo tail -f /var/log/syslog. What I saw was a steady stream of error messages, like hundreds per second, from guvcview saying Unable to dequeue buffer: No such device. Well, I use guvcview with my USB miscroscope and had unplugged it without closing the app first. I had also used it the night before this issue came up, and probably left it open all night.
                          ------------signature starts here------------


                          Comment

                          • Curious.George
                            Badcaps Legend
                            • Nov 2011
                            • 2305
                            • Unknown

                            #14
                            Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                            Originally posted by lookimback
                            i think i might have figured it out...
                            PEBCaK

                            Comment

                            • lookimback
                              Badcaps Legend
                              • Aug 2013
                              • 1489
                              • USA

                              #15
                              Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                              Originally posted by Curious.George
                              PEBCaK
                              Lol, perhaps, but I'd say this is a situation which should be expected to occur.
                              ------------signature starts here------------


                              Comment

                              • Curious.George
                                Badcaps Legend
                                • Nov 2011
                                • 2305
                                • Unknown

                                #16
                                Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                                Originally posted by lookimback
                                Lol, perhaps, but I'd say this is a situation which should be expected to occur.
                                Agreed! But, that's ALWAYS the reason behind software bugs -- the software developer makes an ASSUMPTION which is arbitrary and, often, incorrect! And, when confronted with HIS failure, replies with something like "But you're not supposed to DO that!" (Then, why did you LET me?!)

                                Comment

                                • eccerr0r
                                  Solder Sloth
                                  • Nov 2012
                                  • 8701
                                  • USA

                                  #17
                                  Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                                  Originally posted by Curious.George
                                  But that assumes you are sitting watching it to SEE when it coughs.
                                  You just turned your "workstation" into an unattended "server" so your use model changed...

                                  Comment

                                  • Curious.George
                                    Badcaps Legend
                                    • Nov 2011
                                    • 2305
                                    • Unknown

                                    #18
                                    Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                                    Originally posted by eccerr0r
                                    You just turned your "workstation" into an unattended "server" so your use model changed...
                                    So, if I am willing to sit and watch it compile hundreds of pieces of code, it's a workstation. But, if I get up and walk away, it magically becomes a server?

                                    And, prior to walking away, I should reinstall and reconfigure the software so that it operates AS a server?

                                    Then, when I need to write some NEW code, change everything BACK??

                                    If I sit down in front of machine A, grab the motion controller in my left hand and digitizing pen in my right and begin to design some 3D mechanical parts, it's a workstation, right? And, once I "assemble" those parts into a 3D model set to motion, it becomes a SERVER if I don't care to sit and watch while the machine does its elaborate ray-tracing and rendering?

                                    If I move over to machine B while that's happening and put the finishing touches on a schematic, I'm back at a workstation (?). But, once I place those components on a virtual piece of FR4 and click "autoroute", it transforms itself into a SERVER? Unless I patiently sit and watch it route foils and rip up existing foils as it encounters problems?

                                    Is there a time limit for how long a workstation can be left unattended before it transforms into a server? Should I buy a large stuffed animal to set in my chair, in my absence, to "trick" the machine into retaining its workstation identity?

                                    Don't be silly.

                                    OTOH, the box sitting under my dresser with neither a keyboard nor a monitor and providing services (gee, that word sounds a lot like servers!) to the rest of my network IS a server. And, ADDING a monitor or a keyboard to it won't change that fact/role.

                                    OToOH, the same boxes, running the same OS, sitting atop my workbench are workstations (even without keyboards).
                                    Last edited by Curious.George; 10-26-2018, 11:12 AM.

                                    Comment

                                    • eccerr0r
                                      Solder Sloth
                                      • Nov 2012
                                      • 8701
                                      • USA

                                      #19
                                      Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                                      The problem is that you're doing something unattended. That's not a usage model for a typical machine that you do something and get a response back right away, instead of waiting for hours, unattended - typical of a "server" type application.

                                      What's really silly here is dumping stuff to a file that you don't know the size of that may overflow the disk. That's what's silly. Be more responsible of what you do to your machine. There's no excuse here.

                                      Comment

                                      • Curious.George
                                        Badcaps Legend
                                        • Nov 2011
                                        • 2305
                                        • Unknown

                                        #20
                                        Re: 10 hours lost because stupid laptop wouldn't boot! GRRR!

                                        Originally posted by eccerr0r
                                        The problem is that you're doing something unattended. That's not a usage model for a typical machine that you do something and get a response back right away, instead of waiting for hours, unattended - typical of a "server" type application.
                                        The amount of time you wait for a result has nothing to do with the type of application/machine. Ever download something large? Do you sit and WATCH the download progress? By your definition, if you get up (or look away) your machine now assumes the characteristics of a SERVER -- because the action wasn't quick enough (or you weren't patient enough) to keep you glued to the chair.

                                        What's really silly here is dumping stuff to a file that you don't know the size of that may overflow the disk. That's what's silly. Be more responsible of what you do to your machine. There's no excuse here.
                                        If you reread my post, you will see that redirecting the output of the make(1) to "foo" was not the cause of the problem. Rather, the built-in accounting facilities were not expecting hundreds of thousands of commands to be executed, in rapid succession, between cleanings of the accounting files.

                                        Also note that my installation AVOIDED any "problem" -- as the OP's didn't --by ensuring the accounting log file was allowed to grow on a separate partition where it couldn't interfere with the continued operation (nor "bootability") of the machine.

                                        Similarly, the output of the make(1) was redirected to a file that exists on still another partition -- the one that holds all of the sources for the system, X, and all "packages".

                                        That's how I get uptimes of 400+ days without a crash/reboot. Forty years of running/administering UN*X systems teaches one a little bit about how to keep availability at six nines! When there are "other users" who rely on the machine being "up", you learn how to do lots of things without taking the system down -- lest you want to be doing maintenance in the wee hours of Sunday mornings!

                                        But, hey, I can understand you've probably never done anything that taxed the ability of your machine. Playing solitaire and shooting bad guys is all some folks CAN do with theirs!

                                        Comment

                                        Related Topics

                                        Collapse

                                        • DynaxSC
                                          ASUS ROG STRIX B550-I GAMING - M.2_2 port not working correctly with SAMSUNG SSD980 1TB NVME SSD
                                          by DynaxSC
                                          Hi,

                                          I have an really very weired issue with ASUS ROG STRIX B550-I GAMING.

                                          Some NVME (PCIe) SSD's are not working correctly in the second M.2 socket. An example of it is the SAMSUNG SSD980 1TB NVME SSD.
                                          Some other NVME (I have only smaller ones) disk do work without any problem, also SAMSUNG disks, eg an MZVPV128HDGM-0000.

                                          The behaviour is also really very strange, i.e.:
                                          • The disk shows always up in BIOS - so it is recognized, so far so good
                                          • POST-ing with this disk with fresh Windows 24H2 installation (done in the M.2_1 socket) in the M.2_2 socket
                                          ...
                                          02-23-2025, 01:33 PM
                                        • privato89
                                          Nintendo Switch Oled HEG-001: Second-Stage Boot Failure
                                          by privato89
                                          Hello everyone,

                                          I'm facing an issue with a HEG-001 motherboard that is unable to complete the Second-Stage Boot. I'd like to start directly with my conclusion, which is the decision to replace the MAX77621AEWI chip, and understand if it could be a correct evaluation.

                                          Before I dive into the process that led me to this decision, I want to mention that I have a fully equipped lab where I can perform any kind of tests.

                                          Let's begin:

                                          Following some guides online, I tried to understand how the Boot phase of the Nintendo Switch works, and what are the...
                                          03-26-2024, 06:37 AM
                                        • PainThonNen
                                          Dell E5440 no boot and disk led blinking
                                          by PainThonNen
                                          Hello guys how are you?

                                          I'm facing a Dell E5440 who don't to boot. When I push the power button, the power shows up for some seconds and goes down. After that, wireless and battery led show up (solid white) and the disk led start flashing. No short to ground detected, Tried to flash both U1 et U2 but nothing.

                                          Thank you for any help
                                          11-27-2023, 07:03 PM
                                        • hinisa
                                          Xiaomi router freezes at boot after re-capping
                                          by hinisa
                                          I'm using Xiaomi Router 4A gigabit edition as wireless repeater. It worked fine for 2 years or so, but last week, wireless started to disappear.

                                          When I unplug/replug the power, it was working fine for a day/half a day or so until it freezes again.

                                          After testing with different power supply adapters and concluding it wasn't the issue I finally opened the case and this was the situation:



                                          Anyway I proceed to clean up the mess and replace all caps but unfortunately it made the situation worse. Now router can't even boot, either it freezes during...
                                          07-08-2023, 05:58 AM
                                        • Luke Anders
                                          Lenovo Legion 5 (15ACH6H/17ACH6H) - Does not boot into any OS
                                          by Luke Anders
                                          Hi everyone,

                                          A few details about the laptop itself:

                                          Name: Lenovo Legion 5
                                          Model: 15ACH6H (according to its back cover details) / 17ACH6H (according to the BIOS Menu)
                                          BIOS Version: GKCN60WW
                                          EC Version: GKEC65WW
                                          Serial Number: PF39AQ7Z
                                          Board Model: NM-D562 Rev 3.0
                                          RAM: 16GB Installed, DDR4 (brand new)
                                          NVME: 500GB NVME drive (brand new, installed Windows 10 onto it for testing purposes and tested on another laptop (a dell XPS) works perfectly fine and boots in)

                                          I bought this laptop recently from eBay as a faulty device....
                                          10-12-2024, 05:13 AM
                                        • Loading...
                                        • No more items.
                                        Working...