Hi there.
Have searched various forums but haven't come across the exact issue I am having.
Bought the 1080ti with a fault (randomly shutting down, black screen , fans full speed etc). Previous owner had replaced thermal pads but many were too thick on inspection.
I tested the card in windows, all loaded up ok. Put some load on the card and it crashed. GPU was reporting 92 degrees C at the time.
Switched the stock cooler out for an AIO water cooler from my 1080 FE, booted the card back up - temps were much better and allowed some load to be applied to the card. Ran for a short while then the card shut down again (and the PC also), this time it wouldn't restart as over current protection kicked in.
Pulled the card and diagnosed one of the drmos chips had failed (isolated each VRM by lifting one side of chokes connected to them in turn) - 12v short to gnd on the 3rd from the bottom phase.
I removed the drmos from the board to remove any shorts from the chip pins.
All resistance measurements from the rails seem fine (PEX is at 200 ohms which seems a little high), all voltages are reading fine (PEX, mem, gpu, 1.8v, 3.3v, 5v, 12v). Can't find any shorts now on primary voltages. Checked all PCI data lines / capacitors and they check out fine.
The card will now not start correctly - get backlight after 30 seconds or so and monitor comes on but no post screen, PC will continue booting into windows however).
With 2nd card connected as primary, can load windows - 1080ti card is recognised in device manager (but won't start fully), GPU-Z finds the card with missing clocks etc.
Checked bios flash with nvflash and can read and write bios files fine.
Tried booting MODS/MATS. Linux environment can detect card fine with lspci.
However when running MODS I get the following (below). Is the core partially dead? Can't find any one else reporting these kinds of errors really (tried various MODS versions with similar results 400.xx and up versions are more detailed in their output).
Thanks in advance for any help!
Have searched various forums but haven't come across the exact issue I am having.
Bought the 1080ti with a fault (randomly shutting down, black screen , fans full speed etc). Previous owner had replaced thermal pads but many were too thick on inspection.
I tested the card in windows, all loaded up ok. Put some load on the card and it crashed. GPU was reporting 92 degrees C at the time.
Switched the stock cooler out for an AIO water cooler from my 1080 FE, booted the card back up - temps were much better and allowed some load to be applied to the card. Ran for a short while then the card shut down again (and the PC also), this time it wouldn't restart as over current protection kicked in.
Pulled the card and diagnosed one of the drmos chips had failed (isolated each VRM by lifting one side of chokes connected to them in turn) - 12v short to gnd on the 3rd from the bottom phase.
I removed the drmos from the board to remove any shorts from the chip pins.
All resistance measurements from the rails seem fine (PEX is at 200 ohms which seems a little high), all voltages are reading fine (PEX, mem, gpu, 1.8v, 3.3v, 5v, 12v). Can't find any shorts now on primary voltages. Checked all PCI data lines / capacitors and they check out fine.
The card will now not start correctly - get backlight after 30 seconds or so and monitor comes on but no post screen, PC will continue booting into windows however).
With 2nd card connected as primary, can load windows - 1080ti card is recognised in device manager (but won't start fully), GPU-Z finds the card with missing clocks etc.
Checked bios flash with nvflash and can read and write bios files fine.
Tried booting MODS/MATS. Linux environment can detect card fine with lspci.
However when running MODS I get the following (below). Is the core partially dead? Can't find any one else reporting these kinds of errors really (tried various MODS versions with similar results 400.xx and up versions are more detailed in their output).
Thanks in advance for any help!
Code:
MODS start: Fri Feb 4 06:30:36 2022 Command Line : gputest.js -skip_rm_state_init -mfg CPU Foundry : GenuineIntel Name : 12th Gen Intel(R) Core(TM) i7-12700KF Family : 6 Model : 7 Stepping : 2 Version MODS : 367.56 OperatingSystem: Linux (x86_64) Kernel : 4.17.4-gentoo KernelDriver : 3.87 HostName : tinylinux Smbios version [0x304] is not supported ERROR: Fuse read error gpu 0 dev.sub 0.0 --------------------------- PCI Location : 0x00, 0x05, 0x00, 0x00 DID : 0x1b06 Raw ECID : 0x0000000000e0224000000045b5880d91 Raw ECID (GHS) : 0x000000016445b5880c000000090101c0 ECID : PHRM83-09_x02_y07 Device Id : GP102 Revision : a1 NV Base : 0x71000000 FB Base : 0x40000000 IRQ : 17 NV_PMC_INTR_0 bit 28 high. Trying to clear interrupt by writing 0x0 to register 0x001140 NV_PMC_INTR_0 bit 28 high. Trying to clear interrupt by writing 0x0 to register 0x001144 NV_PMC_INTR_0 bit 30 high. Trying to clear interrupt by writing 0x2 to register 0x12004c Successfully cleared GPU's interrupt state. Unknown PCIE speed cap 0x4 Unknown PCIE speed cap 0x4 ** ModsDrvBreakPoint ** ------------------------- BEGIN ASSERT INFO DUMP ------------------------- invalid. NVRM: instSetBar0WindowToWorkspaceBase_GM200: VGA workspace base is invalid. NVRM: Possible bad register read: addr: 0x31c4f4, regvalue: 0xbad0122e, error code: Unknown SYS_PRI_ERROR_CODE ACPI: Unable to evaluate dev method (_DOD) on 0:5:0.0 GF100GpuSubdevice: FloorsweepingAffected=0 GF100GpuSubdevice: Floorsweeping parameters present on commandline: GF100GpuSubdevice: Floorsweeping parameter mask values: display=0x0 msdec=0x0 msvld=0x0 fbio_shift_override=0x0 ce=0x0 gpc=0x0 fb=0x0 fbio=0x0 fbio_shift=0x0 gpctpc[0]=0x0 gpctpc[1]=0x0 gpctpc[2]=0x0 gpctpc[3]=0x0 gpctpc[4]=0x0 gpctpc[5]=0x0 gpctpc[6]=0x0 gpctpc[7]=0x0 gpczcull[0]=0x0 gpczcull[1]=0x0 gpczcull[2]=0x0 gpczcull[3]=0x0 gpczcull[4]=0x0 gpczcull[5]=0x0 gpczcull[6]=0x0 gpczcull[7]=0x0 GF108PlusGpuSubdevice: Floorsweeping parameters present on commandline: GF108PlusGpuSubdevice: Floorsweeping parameter mask values: pcie_lane=0x0 fbpa=0x0 spare=0x0 GM10xGpuSubdevice: Floorsweeping parameters present on commandline: GM10xGpuSubdevice: Floorsweeping parameter mask values: nvenc=0x0 nvdec=0x0 head=0x0 GM20xGpuSubdevice: Floorsweeping parameters present on commandline: GM20xGpuSubdevice: Floorsweeping parameters mask values: fbp_rop_l2[0]=0x0 fbp_rop_l2[1]=0x0 fbp_rop_l2[2]=0x0 fbp_rop_l2[3]=0x0 fbp_rop_l2[4]=0x0 fbp_rop_l2[5]=0x0 fbp_rop_l2[6]=0x0 fbp_rop_l2[7]=0x0 fbp_rop_l2[8]=0x0 fbp_rop_l2[9]=0x0 fbp_rop_l2[10]=0x0 fbp_rop_l2[11]=0x0 fbp_rop_l2[12]=0x0 fbp_rop_l2[13]=0x0 fbp_rop_l2[14]=0x0 fbp_rop_l2[15]=0x0 GP10xGpuSubdevice: Floorsweeping parameters present on commandline: GP10xGpuSubdevice: Floorsweeping parameters mask values: gpc_pes[0]=0x0 gpc_pes[1]=0x0 gpc_pes[2]=0x0 gpc_pes[3]=0x0 gpc_pes[4]=0x0 gpc_pes[5]=0x0 gpc_pes[6]=0x0 gpc_pes[7]=0x0 gpc_pes[8]=0x0 gpc_pes[9]=0x0 gpc_pes[10]=0x0 gpc_pes[11]=0x0 gpc_pes[12]=0x0 gpc_pes[13]=0x0 gpc_pes[14]=0x0 gpc_pes[15]=0x0 NVRM: DevinitPmuOffloadDevinitToPmu Devinit complete is false NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit_pmu.c:391 ** ModsDrvBreakPoint ** -------------------------- END ASSERT INFO DUMP -------------------------- ** ModsDrvBreakPoint ** ------------------------- BEGIN ASSERT INFO DUMP ------------------------- gvalue: 0xbad0122e, error code: Unknown SYS_PRI_ERROR_CODE ACPI: Unable to evaluate dev method (_DOD) on 0:5:0.0 GF100GpuSubdevice: FloorsweepingAffected=0 GF100GpuSubdevice: Floorsweeping parameters present on commandline: GF100GpuSubdevice: Floorsweeping parameter mask values: display=0x0 msdec=0x0 msvld=0x0 fbio_shift_override=0x0 ce=0x0 gpc=0x0 fb=0x0 fbio=0x0 fbio_shift=0x0 gpctpc[0]=0x0 gpctpc[1]=0x0 gpctpc[2]=0x0 gpctpc[3]=0x0 gpctpc[4]=0x0 gpctpc[5]=0x0 gpctpc[6]=0x0 gpctpc[7]=0x0 gpczcull[0]=0x0 gpczcull[1]=0x0 gpczcull[2]=0x0 gpczcull[3]=0x0 gpczcull[4]=0x0 gpczcull[5]=0x0 gpczcull[6]=0x0 gpczcull[7]=0x0 GF108PlusGpuSubdevice: Floorsweeping parameters present on commandline: GF108PlusGpuSubdevice: Floorsweeping parameter mask values: pcie_lane=0x0 fbpa=0x0 spare=0x0 GM10xGpuSubdevice: Floorsweeping parameters present on commandline: GM10xGpuSubdevice: Floorsweeping parameter mask values: nvenc=0x0 nvdec=0x0 head=0x0 GM20xGpuSubdevice: Floorsweeping parameters present on commandline: GM20xGpuSubdevice: Floorsweeping parameters mask values: fbp_rop_l2[0]=0x0 fbp_rop_l2[1]=0x0 fbp_rop_l2[2]=0x0 fbp_rop_l2[3]=0x0 fbp_rop_l2[4]=0x0 fbp_rop_l2[5]=0x0 fbp_rop_l2[6]=0x0 fbp_rop_l2[7]=0x0 fbp_rop_l2[8]=0x0 fbp_rop_l2[9]=0x0 fbp_rop_l2[10]=0x0 fbp_rop_l2[11]=0x0 fbp_rop_l2[12]=0x0 fbp_rop_l2[13]=0x0 fbp_rop_l2[14]=0x0 fbp_rop_l2[15]=0x0 GP10xGpuSubdevice: Floorsweeping parameters present on commandline: GP10xGpuSubdevice: Floorsweeping parameters mask values: gpc_pes[0]=0x0 gpc_pes[1]=0x0 gpc_pes[2]=0x0 gpc_pes[3]=0x0 gpc_pes[4]=0x0 gpc_pes[5]=0x0 gpc_pes[6]=0x0 gpc_pes[7]=0x0 gpc_pes[8]=0x0 gpc_pes[9]=0x0 gpc_pes[10]=0x0 gpc_pes[11]=0x0 gpc_pes[12]=0x0 gpc_pes[13]=0x0 gpc_pes[14]=0x0 gpc_pes[15]=0x0 NVRM: DevinitPmuOffloadDevinitToPmu Devinit complete is false NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit_pmu.c:391 ** ModsDrvBreakPoint ** NVRM: Devinit on PMU failed to execute correctly!! NVRM: bp @ ../../../../resman/kernel/devinit/nv/devinit.c:1063 ** ModsDrvBreakPoint ** -------------------------- END ASSERT INFO DUMP -------------------------- Failed to read good Jtag Ctrl Status WARNING... Failed to unlock Jtag for access! Error 000000000818 : Gpu.Initialize Mods detected an assertion failure Chipset VID : FFFF (Unknown) DID : FFFF (Unknown) Rm call failed. default Disabled. Chipset ASPM : Disabled Chipset LTR : Enabled Error 000000000818 : Global.InitializeGpuTests Mods detected an assertion failure gputest.js : 59 mfg.spc : 11 boards.js : 7 boards.db : 3208 boards_gp102.db: 16 boards_gp104.db: 196 boards_gp106.db: 157 GpuDevMgr not initialized. Device shutdowns will likely do nothing. Error Code = 000000000818 (Mods detected an assertion failure) ####### #### ######## ### ####### ###### ######## ### ## ## ## ## ### ## ## ## ## ### ####### ######## ## ### ####### ######## ## ### ## ## ## ## ### ## ## ## ######## ######## ## ## ## ######## ######## MODS end : Fri Feb 4 06:30:47 2022 [10.981 seconds (00:00:10.981 h:m:s)]
Comment