Announcement

Collapse
No announcement yet.

Asus GTX980Ti and MSI GTX980 with the same odd problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Asus GTX980Ti and MSI GTX980 with the same odd problem

    Hi,

    I have those two high end graphics cards : Asus GTX 980Ti Matrix Platinum 6Gb and MSI GTX 980 Gaming 4Gb that have the exact same problem.

    The computer starts, doesn't boot, and then beeps, and shortly after the screen turns on but remains black (nothing is displayed but it's getting a signal). Then after a minute it beeps again, and so on. So the graphics cards don't seem completely dead.

    I've tried with several motherboard, same results.

    I've tried the reflow with flux with no change at all.

    The Asus board as convenient voltage measuring points, I'm getting only 0.45V at the VGPU point, and 1.05V at the VMEM and 1.0V at VPLL.

    I'm not sure if the voltage is supposed to be lowered when the card is not is actual 3D use, but that 0.45V at the GPU seems too low which makes me think there a problem with the voltage regulation somewhere.
    I can't find a short on the mosfets, and otherwise it wouldn't start at all in my experience.

    I've checked a lot of components, fuses, resistors, capacitors, chokes, they all appear to be "fine". Keep in mind, my equipement is the bare minimum, and outside of checking continuity and resistance value, I can't do much...yet. Those current regulation chips would be hard to analyze without the proper equipement. I'm ready to invest a bit if I can repair those cards. They're still worth quite a bit, especially the Asus.

    Anybody has experience with these graphics card and this particular issue ? Or any insight into where I could find the source of the problem ?

    #2
    Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

    Ok after some fiddling with both card, I got some results :
    - the MSI GTX980 works...when it's warm !
    I let the computer run for a while when I was testing the voltage all over the board, with a passive heatsink on top. I turned it off and on again and the card worked fine, no artefacts (didn't go in an OS though).
    I let it sit to cool down and it wouldn't work again.
    So I heated up the heastink with a heat gun so the GPU would be at normal usage temps, and it booted up fine.
    There has to be a metal expanding issue somewhere. I did do a reflow which didn't help. I'll do another one.

    - on the Asus 980Ti Matrix, the memory voltage is very low, around 0.40V when I measure around 1V on the 980. I've mesured voltage at all the mosfets around the memory power unit, and four of them will only release 0.40V.
    I've checked the memory VRM controller, and it appears to work fine when I compare it to the datasheet, but my cheap DMM isn't up to the task of measuring if it really works properly. It would be a bit hard to replace with my tools because of the size and the QFN package.
    Last edited by SuperDuty; 01-26-2018, 07:39 AM.

    Comment


      #3
      Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

      Probably, you should recap both the cards. Then, you can continue repairing if any problem remains on any one of the cards. what kind of capacitors are used on the two graphics cards?

      Comment


        #4
        Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

        Originally posted by SuperDuty View Post
        Ok after some fiddling with both card, I got some results :
        - the MSI GTX980 works...when it's warm !
        Yup, typical BGA/bumpgate issue. Expect it to only get worse with time.

        Originally posted by SuperDuty View Post
        I let the computer run for a while when I was testing the voltage all over the board, with a passive heatsink on top. I turned it off and on again and the card worked fine, no artefacts (didn't go in an OS though).
        I let it sit to cool down and it wouldn't work again.
        So I heated up the heastink with a heat gun so the GPU would be at normal usage temps, and it booted up fine.
        There has to be a metal expanding issue somewhere. I did do a reflow which didn't help. I'll do another one.
        Probably a bumpgate issue then - i.e. the solder balls between the GPU die and the GPU substrate are starting to go bad. A reflow cannot fix that - not permanently anyways. However, you might be able to prolong your repair if you undervolt + underclock your card or otherwise somehow limit the maximum TDP under load. If you keep the TDP under 100 Watts (which may or may not be possible), your repair could last much much longer... but that will be at the expense of giving away lots of performance.

        Originally posted by SuperDuty View Post
        I'm not sure if the voltage is supposed to be lowered when the card is not is actual 3D use, but that 0.45V at the GPU seems too low which makes me think there a problem with the voltage regulation somewhere.
        The voltage going to the GPU is indeed usually lower when the card is not in 3D mode - however that usually happens only when the card has booted and the drivers for it have loaded. During PC power-up, the GPU V_core should be at whatever is the maximum (non-boost speed) voltage the chip takes.

        That said, you'll never really see less than 0.8V on any PC silicon chip, as most diode junctions usually have a typical forward voltage of 0.7V. Getting to close to that threshold will make most chips not work.

        As for what causes you to see these low voltages on the memory of the ASUS card... - probably BGA/bumpgate issue again, just like the MSI. Give it another reflow. No need to go full temperature though, as bumpgate issues will be "fixed" just the same either way - that is, only a temporary fix.

        Originally posted by caspian View Post
        Probably, you should recap both the cards. Then, you can continue repairing if any problem remains on any one of the cards. what kind of capacitors are used on the two graphics cards?
        You're not going to find bad caps on these high-end cards. They use mostly ceramics, Tantalums, and polymer. Ceramics and Tantalum fail short-circuit, so you will definitely see it if one went bad. Polymers, it's a bit more random - may or may not get a short-circuit or high ESR. You can pull out one, and if it measures proper low ESR, you can be sure the other polymers are probably okay too.

        All in all, don't expect much from modern high-end video cards nowadays - they are built like shit and simply don't last. If you get your cards fixed or at least semi-working, sell them and put your money to better things.
        Last edited by momaka; 01-28-2018, 10:37 PM.

        Comment


          #5
          Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

          If the cards have the bumpgate issue, they do not worth repairing. How old are the graphics cards?

          Comment


            #6
            Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

            Originally posted by caspian View Post
            If the cards have the bumpgate issue, they do not worth repairing. How old are the graphics cards?
            We're not talking about video cards from facking 2007 (or near there) so I can't confirm bumpgate issues at this time...

            (The GeForce 8600s and 8400s were built like shit, back in the Core 2 era!)
            Last edited by RJARRRPCGP; 01-29-2018, 07:59 AM.
            ASRock B550 PG Velocita

            Ryzen 9 "Vermeer" 5900X

            16 GB AData XPG Spectrix D41

            Sapphire Nitro+ Radeon RX 6750 XT

            eVGA Supernova G3 750W

            Western Digital Black SN850 1TB NVMe SSD

            Alienware AW3423DWF OLED




            "¡Me encanta "Me Encanta o Enlistarlo con Hilary Farr!" -Mí mismo

            "There's nothing more unattractive than a chick smoking a cigarette" -Topcat

            "Today's lesson in pissivity comes in the form of a ziplock baggie full of GPU extension brackets & hardware that for the last ~3 years have been on my bench, always in my way, getting moved around constantly....and yesterday I found myself in need of them....and the bastards are now nowhere to be found! Motherfracker!!" -Topcat

            "did I see a chair fly? I think I did! Time for popcorn!" -ratdude747

            Comment


              #7
              Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

              Originally posted by caspian View Post
              If the cards have the bumpgate issue, they do not worth repairing. How old are the graphics cards?
              I think the GTX980 is just a few years old now (2-3 tops?)

              Technically, modern video cards don't have the same "bumpgate" issue that the GeForce 7000 and 8000 series is known for (as well as the late 6000 series).

              But the failure mode has always been the same: GPU silicon die separating from the GPU substrate. BGA issues between the GPU substrate and video card's PCB is much more rare than people think, and rarely the actual problem. That's why reflowing (and sometimes even re-balling) is often only a temporary fix on an original chip.

              I don't want to sound pessimistic here, but pretty much all modern high-end video cards made in the last 5 years are doomed. At least that's what I make of it when I see well-cooled cards still fail. The mid and low-range cards might have a better chance at surviving longer... but who knowns. At least the older generations did. (But then again, who wants to game on a Radeon HD5450/6450 or GeForce GT210/220/230 equivalent video cards anyways? )

              At this point, electron migration could also be part of the issue, seeing how manufacturing tech is getting smaller and smaller, but we still see 1.xxx Volts on the core of many modern GPUs, as manufacturers try to make the clocks high.
              Last edited by momaka; 01-29-2018, 09:44 AM.

              Comment


                #8
                Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                That is right. The technique of increasing clock frequency has failed on modern graphics cards. Instead, manufacturers should use parallelism of low-frequency GPUs. Although they have made dual-GPU cards so far, they should increase the number of GPUs on the Graphics card. However, I do not know how many it can be scaled up. It seems parallelism of low-frequency GPUs leads to a more stable high-end graphics card.
                Last edited by caspian; 01-29-2018, 01:56 PM.

                Comment


                  #9
                  Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                  Originally posted by caspian View Post
                  The technique of increasing clock frequency has failed on modern graphics cards.
                  This is because GPUs are designed with the intent to do a large number of simple (matrix) calculations (short, wide achitecture) and not complex calculations like CPUs are (longer and more narrow architecture with much more looping and branching possible). Thus, GPUs have a huge number of simple compute units (shaders) that do all of these computations in parallel. But because all of the units are running in parallel, you can't increase the frequency too much. This stems from the fact that no two transistors are exactly the same - even on the same silicon die. Some will simply not be able to run as fast as others. So when you have a lot of transistors running in parallel, you will always be limited by the slowest transistor in the group.

                  Originally posted by caspian View Post
                  Instead, manufacturers should use parallelism of low-frequency GPUs. Although they have made dual-GPU cards so far, they should increase the number of GPUs on the Graphics card. However, I do not know how many it can be scaled up. It seems parallelism of low-frequency GPUs leads to a more stable high-end graphics card.
                  This is not a very efficient method, unfortunately. While the power used for each GPU chip can be lowered, the total power consumed by the graphics card will be greater than having a single high-power GPU that combines all of those GPUs together in one die. Balancing also becomes more complex with multi-GPU graphics cards (and the balancing hardware needed also itself adds to the inefficiency). So that's why we don't see this done.

                  But you are right - it could possibly lead to a more reliable hardware.

                  Still, I think the problem is mainly a physical one: there is too much power consumption (and dissipation) for the surface area of modern graphics chips. The smaller manufacturing technology and lower core voltages are not helping this, as modern GPUs draw a lot more current for the same TDP (note, we are not comparing GPU performance or efficiency here). Take for example two 30W TDP GPUs, one made on older larger nm technology (using 1.5V for its core) and one made on newer smaller nm technology (only 1V for its core). Given the same TDP, the first GPU will only require 30W / 1.5V = 20 Amps for its core. But the newer card will require 30W / 1.0V = 30 Amps for its core. Now scale that up 4 times for a high-end video card: the first will draw 80 Amps, but the second 120 Amps - a whopping 40 Amps extra! Then factor in that the older GPU will have a much larger surface area (due to being made on a large nm fab), so it can have a lot more (or bigger) solder balls to receive that 80 Amps from the GPU VRM. In contrast, the newer chip will have a smaller surface area (hence possibly less or smaller solder balls between die and substrate), but will have more current flowing through those smaller solder balls. Does anyone see what's going wrong here now?

                  Comment


                    #10
                    Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                    Originally posted by momaka View Post

                    I don't want to sound pessimistic here, but pretty much all modern high-end video cards made in the last 5 years are doomed.
                    Well, I heard that Fermi is prone to failing... Probably the worst of the later gen cards!
                    Last edited by RJARRRPCGP; 01-31-2018, 01:01 PM.
                    ASRock B550 PG Velocita

                    Ryzen 9 "Vermeer" 5900X

                    16 GB AData XPG Spectrix D41

                    Sapphire Nitro+ Radeon RX 6750 XT

                    eVGA Supernova G3 750W

                    Western Digital Black SN850 1TB NVMe SSD

                    Alienware AW3423DWF OLED




                    "¡Me encanta "Me Encanta o Enlistarlo con Hilary Farr!" -Mí mismo

                    "There's nothing more unattractive than a chick smoking a cigarette" -Topcat

                    "Today's lesson in pissivity comes in the form of a ziplock baggie full of GPU extension brackets & hardware that for the last ~3 years have been on my bench, always in my way, getting moved around constantly....and yesterday I found myself in need of them....and the bastards are now nowhere to be found! Motherfracker!!" -Topcat

                    "did I see a chair fly? I think I did! Time for popcorn!" -ratdude747

                    Comment


                      #11
                      Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                      Originally posted by momaka View Post
                      But because all of the units are running in parallel, you can't increase the frequency too much. This stems from the fact that no two transistors are exactly the same - even on the same silicon die. Some will simply not be able to run as fast as others. So when you have a lot of transistors running in parallel, you will always be limited by the slowest transistor in the group.
                      The same kind of thing can occur with CPUs as well... And some of this reminds me of socket 775, where quad cores are an afterthought, and thus, quad cores are more difficult to run! They, (especially 65nm quads) have issues with the CPU-to-FSB termination! (LVDS?) And that's why you can't get as high of FSB frequencies! And that's not to do with the core frequencies per-se, but it has to do with how many cores are in a single unit (let's say, the socket) and the signal between the CPU and FSB. It looks a lot like cramming more cores into socket 775 caused CPU communication problems with the FSB!
                      Last edited by RJARRRPCGP; 01-31-2018, 12:57 PM.
                      ASRock B550 PG Velocita

                      Ryzen 9 "Vermeer" 5900X

                      16 GB AData XPG Spectrix D41

                      Sapphire Nitro+ Radeon RX 6750 XT

                      eVGA Supernova G3 750W

                      Western Digital Black SN850 1TB NVMe SSD

                      Alienware AW3423DWF OLED




                      "¡Me encanta "Me Encanta o Enlistarlo con Hilary Farr!" -Mí mismo

                      "There's nothing more unattractive than a chick smoking a cigarette" -Topcat

                      "Today's lesson in pissivity comes in the form of a ziplock baggie full of GPU extension brackets & hardware that for the last ~3 years have been on my bench, always in my way, getting moved around constantly....and yesterday I found myself in need of them....and the bastards are now nowhere to be found! Motherfracker!!" -Topcat

                      "did I see a chair fly? I think I did! Time for popcorn!" -ratdude747

                      Comment


                        #12
                        Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                        I like your electronics point of view. Computer technicians (such as me) consider each electronics component as a logical component that does some logical function on power or data line. But they do not consider all electrical details which an electronics component may have.

                        Comment


                          #13
                          Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                          The GTX980 is from early 2015, the 980Ti is from early 2016. They are still very good cards.

                          The 980 will go in Windows with artefacts if I keep heating the card with a heat gun, but it crashes quick and the screen goes black again. I guess I need to change the GPU with a new one.

                          Still haven't found the exact issue with the 980Ti, but that low memory voltage shouldn't be related to the GPU. I did reflow and reheat, doesn't make a difference.

                          As for Fermi, my GTX480 has had three reflow since I got it in late 2013. It runs strong, just finished RE7 and started playing GTA5 this week. Both in 1080p. Reflows last about 18 months on this one, which is pretty good for a card that runs nearly 24/7.

                          Comment


                            #14
                            Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                            Your MSI-GTX980 seems to have capacitor problem, to me. Now my question is: when do a particular polymer capacitor go bad on a Graphics card?

                            Comment


                              #15
                              Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                              I have a dead 970 here that a friend of mine gave me to take a look at - so it's a GPU chip issue confirmed. Interesting. nVidia had gotten their act together after the big bumpgate issue, their chips used to fail a lot less than AMDs over quite a few years now. But i have already seen several reflow and reballing videos with GTX 980s, maybe they have an issue too and "bumpgate rev. 2" is on the way... time will tell.

                              Then i'll go look and see whether reballing stencils have been released for these yet... if it'll even work and the chip isn't toast.
                              Originally posted by PeteS in CA
                              Remember that by the time consequences of a short-sighted decision are experienced, the idiot who made the bad decision may have already been promoted or moved on to a better job at another company.
                              A working TV? How boring!

                              Comment


                                #16
                                Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                                If you treat these cards with respect and keep the them clean and cooled they last forever, High heat and then cooling is the issue as with all electronic devices. I have several that have been running 24/7 and get regular maintenance cleaning every 12 months and still work after 20 years!

                                Comment


                                  #17
                                  Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                                  Originally posted by Th3_uN1Qu3 View Post
                                  Interesting. nVidia had gotten their act together after the big bumpgate issue, their chips used to fail a lot less than AMDs over quite a few years now. But i have already seen several reflow and reballing videos with GTX 980s, maybe they have an issue too and "bumpgate rev. 2" is on the way... time will tell.
                                  Like I said, I think we are getting into a physical problem here, and I suspect it has to do with too much current flow through too few/small GPU chip<->substrate solder balls. Either that, or the manufacturer(s) are deliberately cutting back on the quality of materials used, and thus only making it good enough to last past the warranty (lol, what warranty? I haven't see more than 3-9 months on most crap nowadays). After all, it seems that many people who buy high-end GPUs upgrade them every few years anyways, so maybe that's why manufacturer's don't care to make a long-lasting product?

                                  But I don't know for sure either. Like you said, time will tell.

                                  Originally posted by brethin View Post
                                  If you treat these cards with respect and keep the them clean and cooled they last forever, High heat and then cooling is the issue as with all electronic devices. I have several that have been running 24/7 and get regular maintenance cleaning every 12 months and still work after 20 years!
                                  You have a GTX 980 that is 20 years old?!?? Did you use your DeLorean again, doc?

                                  In all seriousness, I too can see how a high-end GPU from 20 years ago (that'be 1998, if I am not mistaken) would still work today. Heck, most of the hardware from that period still works, crap caps aside.

                                  I think the GeForce 3 and ATI Radeon 8500/9000 were the last robust high-end video cards. After that, things started taking a dive. The GeForce 4 (TI 4200/4400/4600) are actually not bad either, but they can still fail with the stock nVidia reference coolers - not sure if it's the BGA or something else, as these are non-flip-chip GPUs. With ATI, the Radeon 9700 (their first flip-chip design) is where things started going downhill. Like nVidia, ATI too used an undersized reference cooler. And in general, they would still fail even when cooled well. I think ATI had a "bumpgate" issue of their own as well, just not as bad as nVidia's. Seems like they fixed it around the HD2k/HD3k era. For nVidia, I think it was the GeForce 9x00 series that they finally addressed the bumpgate issue. So I'd say their GF 9x00 line was probably the most robust. After that, TDP really started to sky-rocket on both ATI and nVidia cards, so things just got ugly again.
                                  Last edited by momaka; 02-09-2018, 02:23 PM.

                                  Comment


                                    #18
                                    Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                                    ATI/AMD has issues well spread among its chips… I mean, they had indeed the issue on the 9000 series (R300), the X1000 series, the RS780M/RS880M, the HD4800 series, the HD5000/HD6000 series, the R9 270X+ series, and I don't know about later ones but they may not be reliable either… Some of them are related to heat, some are not (or will eventually fail even though they are kept in a somewhat low temperature).
                                    They really need to work on the reliability of their chips.

                                    NVidia had the well known issue between 2006-2008 (was solved on chips made after 30th of 2008 if I recall correctly). I don't have record of any other issue like this, but I saw some card fail without reason like a GT210, well-cooled GTX 460 and GTX 650. Also MCP7x sometimes failed even though they shouldn't have been concerned by the bumpgate issue.
                                    OpenBoardView — https://github.com/OpenBoardView/OpenBoardView

                                    Comment


                                      #19
                                      Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                                      Originally posted by piernov View Post
                                      but I saw some card fail without reason well-cooled GTX 460
                                      I got word that it's a well known problem with Fermi... Another shit generation, and the GeForce 9800 possibly counts, but the one that I discovered to have failed, was in an OEM case, thus, a good chance of it getting toasty!
                                      Last edited by RJARRRPCGP; 02-10-2018, 12:23 PM.
                                      ASRock B550 PG Velocita

                                      Ryzen 9 "Vermeer" 5900X

                                      16 GB AData XPG Spectrix D41

                                      Sapphire Nitro+ Radeon RX 6750 XT

                                      eVGA Supernova G3 750W

                                      Western Digital Black SN850 1TB NVMe SSD

                                      Alienware AW3423DWF OLED




                                      "¡Me encanta "Me Encanta o Enlistarlo con Hilary Farr!" -Mí mismo

                                      "There's nothing more unattractive than a chick smoking a cigarette" -Topcat

                                      "Today's lesson in pissivity comes in the form of a ziplock baggie full of GPU extension brackets & hardware that for the last ~3 years have been on my bench, always in my way, getting moved around constantly....and yesterday I found myself in need of them....and the bastards are now nowhere to be found! Motherfracker!!" -Topcat

                                      "did I see a chair fly? I think I did! Time for popcorn!" -ratdude747

                                      Comment


                                        #20
                                        Re: Asus GTX980Ti and MSI GTX980 with the same odd problem

                                        Fermi had issues because it used the same underfill as Bumpgate-era chips, just with a different type of solder inside. They had to do this because the die was so large that a stiffer underfill would cause the package to delaminate within a few heat cycles.
                                        Originally posted by PeteS in CA
                                        Remember that by the time consequences of a short-sighted decision are experienced, the idiot who made the bad decision may have already been promoted or moved on to a better job at another company.
                                        A working TV? How boring!

                                        Comment

                                        Working...
                                        X