Jump to content

Recommended Posts

Posted (edited)
1 hour ago, SharpeXB said:

Lucky me I also had a failed (degraded?) 13900K. That problem presented as greatly reduced performance but not a crash or black screen. 

The CPU degradation can appear in somewhat different ways from one machine to another.
It can manifest by Windows closing applications in the background by itself, or BSOD (blue screen crashes), or system lock up (freezes), black screens, general instability, etc.

"Black Screen, Fans 100 percent, hard-reset or power-cycle is the only option to restart" is one of the reported symptoms.
Though it could be so many unrelated things (corrupted Windows and/or applications files or drivers, faulty PSU, or motherboard, or GPU, or Drive, etc).

That's why troubleshooting with stress-testing is important, to try determining what's causing it.
I maintain my previous suggestion.

Edited by LucShep

CGTC - Caucasus retexture  |  A-10A cockpit retexture  |  Shadows Reduced Impact  |  DCS 2.5.6 - a lighter alternative 

DCS terrain modules_July23_27pc_ns.pngDCS aircraft modules_July23_27pc_ns.png 

Spoiler

Win10 Pro x64  |  Intel i7 12700K (OC@ 5.1/5.0p + 4.0e)  |  64GB DDR4 (OC@ 3700 CL17 Crucial Ballistix)  |  RTX 3090 24GB EVGA FTW3 Ultra  |  2TB NVMe (MP600 Pro XT) + 500GB SSD (WD Blue) + 3TB HDD (Toshiba P300) + 1TB HDD (WD Blue)  |  Corsair RMX 850W  |  Asus Z690 TUF+ D4  |  TR PA120SE  |  Fractal Meshify-C  |  UAD Volt1 + Sennheiser HD-599SE  |  7x USB 3.0 Hub |  50'' 4K Philips PUS7608 UHD TV + Head Tracking  |  HP Reverb G1 Pro (VR)  |  TM Warthog + Logitech X56 

 

Posted (edited)
9 minutes ago, LucShep said:

That's why troubleshooting with stress-testing is important, to try determining what's causing it.
I maintain my previous suggestion.

Absolutely.

Hence my original post on the subject, advocating some basic troubleshooting steps to help isolate the cause, and stress testing.

Edited by kksnowbear

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted
On 8/16/2024 at 9:52 PM, kksnowbear said:

Interesting. 

I noticed you didn't indicate some details about your system which might be helpful:  Who built the system?  What motherboard?  What CPU cooling? (Yes, I read the part about temps being OK, question re: cooling still applies, if you please)

You have a 1300KF, which (in this case) is unfortunate, because there's no integrated graphics adapter.  This would allow connecting a second monitor to see if it's the GPU output going away or the whole machine (FWIW I believe it's the machine, not the GPU).

Any chance you can come up with a small, 'entry level' type GPU (GT 710 etc) to use for testing?  I'd install one just to connect another monitor, to help see what's taking place.  (Note: Obviously this would depend on what motherboard you're using, hence my question above).

TBH I don't think it's the GPU - and yes, I am taking into account what you said about RMAing the first GPU (I'll gladly discuss further if you wish).

Anyhow, all this isn't intended as a 'magic fix'...it's just something you can try to help identify the issue.

Incidentally, you mention removing a bunch of bloatware (which is good)...any chance you're using Corsair software to control/monitor liquid cooling?

Also, one other question:  As far as I can see, you mention this issue occurring during DCS, but don't mention any other games or utilities.  Specifically, have you done any stress testing (i.e. for GPU/CPU) such as 3DMark etc to see if you can duplicate the issue?

 

Hi kksnowbear,

Rig is a custom build from a pretty reputable vendor here is Melbourne Australia (scorptec - have had dealings with them for many years). Apologies, i should have provided the additional details

  • Deepcool LT720 Premium Liquid CPU Cooler, 360mm Radiator
  • G.Skill Ripjaws V 32GB (2x16GB) PC4-28800 (3600MHz) DDR4
  • Gigabyte Z690 AORUS ELITE AX DDR4 MB

When i was getting WHEA errors early on, i brought the system back to their support dept and they ran a number of stress tests over a few days - identifying a possible bad GPU (which they RMA'd on my behalf) and defective RAM (what is with quality control and consumer parts these days?!?!?). PSU tested fine at that time (GPU is connected via the 12VHPWR cable as supplied by the MSI PSU).

I agree with you, also do not immediately think this is a GPU fault. I also do not think this is a 12VHPWR cable issue (sense wire LEDs on the GPU do not indicate a delivery power issue), and i am certain the strut is correctly placed and preventing slot contact issues. The MotherBoard status lights are normal when this occurs as well.

No Corsair software either, seems that software was causing similar issues for people awhile back. I use (and trust) Remi Mercier's FanControl (github) for all heat mgt and avoid any control software from gigabyte, asus, corsair, etc. Although i do launch (and subsequently terminate) Asus GPU Tweak III to disable fan control and set the GPU power level to 95%. I had HWINFO logging all voltages and temps to SSD for inspection after crash, also set set process priority for HWINFO & FanControl (using Process Lasso Pro) to above normal.

I previously ran GPU + CPU stress testing using 3DMark (paid version) and others (OCCT, Cinebench, etc), however i could never trigger the black screen of death. It is really weird and almost at times seems at times to trigger at GPU load transition (e.g. High to Low; e.g. after a few seconds in DCS cockpit -to- mission briefing); however, crashing also occurs that is in no way related to what seems to be a load transition scenario.

You have motivated and encouraged me to get off my lazy ass properly test this with another GPU - i have a RTX 3080 in another older system that i could swap into my DCS rig for a time.

I am honestly gratefully for all the time and effort everyone has put into responding to this thread!!! I opened the forum today and was so happy to see so many responses!

Going to read+respond to following posts.

Thank you kksnowbear!

 

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted
On 8/17/2024 at 12:24 AM, SharpeXB said:

I’ve had two problems with a 4090 similar to what you’re describing. The card being loose in the slot (a warning light on the MB will indicate that) solved by that brace which is very much needed for such a large card. And I had the power cable plug fail too, that fried the power connection to the card and replacing it was required. Ouch. 

 

Very good point SharpeXB, it is what looked into first. No warning lights on the MB nor the GPU (sense pins), also have been very careful do adjust the Asus anti-card-sag strut to mitigate any PCIE slot contact issues. 

A fried power connection sound like nightgmare to deal with! I have inspected the power connector at the card and at the PSU, luckily i have not had this occur however it seems to be constantly reported on NorthridgeFix youtube channel.

On 8/16/2024 at 11:59 PM, BitMaster said:

Indeed, some stress testing was nice and if you have the knowledge, install Linux and Steam with DCS and see if it reproduces in Linux as well ( DCS does work in Linux with Steam/Proton )

HI BitMaster, i had no idea DCS could be run with the Proton overlay! Does Nvidea release reliable Linux driver updates?

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted
On 8/17/2024 at 9:09 AM, BitMaster said:

is there any chance that you have a degraded Intel 13th gen 65w+ CPU ?

When I read your specs it says 13700KF, and iirc that SKU is affected.

Try your card in another PC ( preferably NOT an affected Intel CPU as well )  but you must also stress test it in yours to be able to compare.

The way it hangs up, not a sudden reboot but a true hanger, makes me think it is not the GPU or power delivery, it's deeper.

I am also wondering about this; i am running the following BIOS (gigabyte specific) settings to avoid the higher voltages the seem to be systemic to degredation (most of this is advised from buildzoid vidoes)

  • IA VR Voltage limit: 1400
  • LL Calibration: High, AC LC: 55, DC LL 55 (50% of 1.1milliohm renaesas controller)
  • VCore voltage mode: Adaptive, NF Offset mode: Legacy
  • Internal CPU VCore offset: -.07
  • PL1/PL2/IccMax as per intel specs

Good point, I do need to do more re testing with at least a different GPU.

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted
13 hours ago, LucShep said:


I was reading the OT and thinking the same, that it could be early signs of the now well known 13th/14th CPU degradation.... 🫤


@nephilimborn have you updated your motherboard BIOS to latest version, for the new Intel microcode?  (if you haven't, you should)

🤔 Not sure if you're willing to try something with CPU settings in your motherboard BIOS, just for a test....

If you are, then try to sync (i.e, lock) all your P-Cores, and use a manual clock value that is same for all P-Cores (5.3 GHz is the stock "all P-Core" maximum clock for i7 13700K)

Repeat testing, if it still does the same thing, go back to BIOS and reduce 100 MHz in that "all P-Core" clock (so, now to 5.2 GHz) and try again.
...repeat and so on....

If at some point the problem stops happening (by lowering the P-Cores clock), then you may have a CPU that has started to degrade, not being able to reach the stock ultra high "boost" clocks with stock voltages.  (note: do not increase the CPU Core Voltage to reach the stock boost clocks, as it just makes things worse!)

Also, you mentioned not having yet disabled C-States, which is good because you should never disable those.
What you may do, if intended, is reduce the limit of C-States (for example, to "C3" instead of "Auto" - which can go to "C10" deepest savings limit) - Intel C-States explained.

 

I was also wondering about this...i have since watched many of buildzoid's videos and adjusted the VRM and CPU settings as generally advised - however that of course is not going to fix an degradation damage already done.

Similarly i was wondering if i should test with incrementally lowering & locking the P-Cores - thank you for the advice, you have motivated me to try this as well!

Also, thank you for confirming my fears wrt messing with C-State!

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted
12 hours ago, kksnowbear said:

Absolutely.

Hence my original post on the subject, advocating some basic troubleshooting steps to help isolate the cause, and stress testing.

 

Thank you all, very good advise - i need to invest the effort to troubleshoot and test as recommended, i cannot keep holding onto the false hope this will be fixed or necessarily mitigated by a collection of BIOS, OS, Driver settings. 

First method i think is to swap the 4090 (problematic system) with a known operable system (my old 3080 rig) in order to narrow down if this is hardware or not.

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted (edited)

kksnowbear,

is there any way you can stay strictly on the topic and not get into this useless ego thing ?  It is not helping in any way.

Edited by BitMaster
  • Like 1

Gigabyte Aorus X570S Master - Ryzen 5900X - Gskill 64GB 3200/CL14@3600/CL14 - Sapphire  Nitro+ 7800XT - 4x Samsung 980Pro 1TB - 1x Samsung 870 Evo 1TB - 1x SanDisc 120GB SSD - Heatkiller IV - MoRa3-360LT@9x120mm Noctua F12 - Corsair AXi-1200 - TiR5-Pro - Warthog Hotas - Saitek Combat Pedals - Asus XG27ACG QHD 180Hz - Corsair K70 RGB Pro - Win11 Pro/Linux - Phanteks Evolv-X 

Posted

Linux drivers exist for Nvidia, sure.  When you use Ubuntu or Mint it's just a few clicks and a reboot.

I would certainly test that GPU in another system, non-Intel 13+14th gen, but also test your system with Linux. They server different purposes, Linux tests if Windows is the culprit, another system with your GPU tests your GPU, both is helpful to do so you can exclude things.

  • Like 1

Gigabyte Aorus X570S Master - Ryzen 5900X - Gskill 64GB 3200/CL14@3600/CL14 - Sapphire  Nitro+ 7800XT - 4x Samsung 980Pro 1TB - 1x Samsung 870 Evo 1TB - 1x SanDisc 120GB SSD - Heatkiller IV - MoRa3-360LT@9x120mm Noctua F12 - Corsair AXi-1200 - TiR5-Pro - Warthog Hotas - Saitek Combat Pedals - Asus XG27ACG QHD 180Hz - Corsair K70 RGB Pro - Win11 Pro/Linux - Phanteks Evolv-X 

Posted (edited)
44 minutes ago, BitMaster said:

kksnowbear,

is there any way you can stay strictly on the topic and not get into this useless ego thing ?  It is not helping in any way.

 

I already explained, everything I've said is "on topic" - the point of which is to help the OP (as he requested, and has since indicated his gratitude for) as well others who read this.

Sorry if it doesn't agree with your definition of what the topic involves, but everything I've said is directly related.  It's not my fault people insist on giving bad advice, nor that they posted it on the thread.  I didn't force them to do that, but it's potentially disastrous to not call it what it is.

I suppose you think we're all better off letting bad advice fly unchecked.  Factually, as I explained, it's potentially harmful to the OP, and to others who will read this at some point.

Maybe you feel that the two owners of systems I had to repair already were better off with bad advice from internet forums concerning modular PSU cables.  I can assure you they don't think so, and certainly not once they paid to have boards replaced that were destroyed by the wrong cables, because someone on the internet gave them bad advice.

Fortunately, I've been able to prevent at least two others from making the exact same mistake, by correcting bad information.

"Not helpful in any way?" Sorry, but I disagree. Those other people i helped would disagree, too.

As I said, I didn't open the can of worms, I was just trying to limit the potential for damage.

Edited by kksnowbear

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted (edited)
On 8/17/2024 at 5:42 PM, LucShep said:


I was reading the OT and thinking the same, that it could be early signs of the now well known 13th/14th CPU degradation.... 🫤


@nephilimborn have you updated your motherboard BIOS to latest version, for the new Intel microcode?  (if you haven't, you should)

🤔 Not sure if you're willing to try something with CPU settings in your motherboard BIOS, just for a test....

If you are, then try to sync (i.e, lock) all your P-Cores, and use a manual clock value that is same for all P-Cores (5.3 GHz is the stock "all P-Core" maximum clock for i7 13700K)

Repeat testing, if it still does the same thing, go back to BIOS and reduce 100 MHz in that "all P-Core" clock (so, now to 5.2 GHz) and try again.
...repeat and so on....

If at some point the problem stops happening (by lowering the P-Cores clock), then you may have a CPU that has started to degrade, not being able to reach the stock ultra high "boost" clocks with stock voltages.  (note: do not increase the CPU Core Voltage to reach the stock boost clocks, as it just makes things worse!)

Also, you mentioned not having yet disabled C-States, which is good because you should never disable those.
What you may do, if intended, is reduce the limit of C-States (for example, to "C3" instead of "Auto" - which can go to "C10" deepest savings limit) - Intel C-States explained.

 

17 hours ago, nephilimborn said:

I was also wondering about this...i have since watched many of buildzoid's videos and adjusted the VRM and CPU settings as generally advised - however that of course is not going to fix an degradation damage already done.

Similarly i was wondering if i should test with incrementally lowering & locking the P-Cores - thank you for the advice, you have motivated me to try this as well!

Also, thank you for confirming my fears wrt messing with C-State!

👍 Buildzoid!
He may sound like a nerd rambling but his videos often show very interesting facts with his experiments (f.ex, latest oscilloscope videos on Intel K chips).

Once you sort those issues, and if not done already, consider stopping the single/dual core boost from happening, because of its 1.5v+ voltage spikes (one of the main culprits for the current 13th/14th Gen degradation issues).
Easiest way to do this is by sync'ing (locking) your P-Cores all at same max possible clock (close to what the "All P-Cores max clocks" is out-of-the-box). 

Even better if with the Cpu Core Voltage (Vcore) limited to lower values, at around 1.35v (or below).
You can set a limit of voltage, in the BIOS setting "IA VR Voltage Limit" with a value between 1350 and 1400 mv. 
Or you can manually adjust the Cpu Core Voltage (Vcore), either making it by "fixed" or by "offset" voltage adjustment (whichever way you prefer).

One way to look at this is like the undervolt that so many also do on high-end GPUs. It prolongs its life, by lowering the voltage and temps.
In this particular case with Intel 13th/14th gen, it's (IMO) a very good procedure to drastically mitigate the possible degradation, and doesn't really affect general performance.

Edited by LucShep

CGTC - Caucasus retexture  |  A-10A cockpit retexture  |  Shadows Reduced Impact  |  DCS 2.5.6 - a lighter alternative 

DCS terrain modules_July23_27pc_ns.pngDCS aircraft modules_July23_27pc_ns.png 

Spoiler

Win10 Pro x64  |  Intel i7 12700K (OC@ 5.1/5.0p + 4.0e)  |  64GB DDR4 (OC@ 3700 CL17 Crucial Ballistix)  |  RTX 3090 24GB EVGA FTW3 Ultra  |  2TB NVMe (MP600 Pro XT) + 500GB SSD (WD Blue) + 3TB HDD (Toshiba P300) + 1TB HDD (WD Blue)  |  Corsair RMX 850W  |  Asus Z690 TUF+ D4  |  TR PA120SE  |  Fractal Meshify-C  |  UAD Volt1 + Sennheiser HD-599SE  |  7x USB 3.0 Hub |  50'' 4K Philips PUS7608 UHD TV + Head Tracking  |  HP Reverb G1 Pro (VR)  |  TM Warthog + Logitech X56 

 

Posted (edited)

in the interim i have been able trigger the black screen consistently, not with DCS, but with praydogs UEVR mod + Robocop: Rogue City. This hammers the CPU and GPU in a manner that seems to trigger the behaviour within 2min of gameplay (where stress testing with tools such as 3DMark do not)

I collected data in HWINFO every 500miliseconds...others have suggested that NVidia firmware might respond with minimal voltage tolerance to mitigate the burn-the-house-down risk with the stupid 12VHPWR connector - the voltages near the event seem normal (16pin HVPWR voltage at 12.157 which is similar to other loads that did not result in black screen + fans 100% requiring hard reset).

Other relevant sensor data near the event (all seem within operational boundaries - note i had a 97% GPU power limit set, locked all cores to 5.3, vid request limit to 1.4)

  • GPU Power: 415.173W; 16pin HVPWR Power: 406.472W; Clock: 2700; Load: 96%; Hotspot: 70deg C
  • GPU Core Voltage: 1.04v; PCIe Input Voltage: 12.171; 16-pin HVPWR voltage: 12.157
  • Performance limited ONLY by Voltage Reliability (this is normal and expected at this power level)
  • CPU VID: 1.323v; Vcore: 1.344v; VR VROUT (SVID OUT): 1.351v; VR VCC Current: 82amps; CPU Package temp: 52deg C

I found that if i power limited the GPU to 90% (GPU Tweak Utility) i was able to avoid the black screen, well at least for the 30min that i tested in game.

Question: I know to avoid the recalled cablemod 16pin adapters; however the CableMod M-Series Pro ModMesh Sleeved 12VHPWR StealthSense Direct Cable Kit for MSI PCIE5 (16-pin to 16-pin) looked like something to try? Any thoughts?

Thanks again for everyone's help on this; really do appreciate the advice.

Edited by nephilimborn

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted (edited)

@nephilimborn Looks like power delivery issue, either by PSU or GPU cable connector, or the cable itself (?).

Can't pronounce about CableMod 12VHPWR connectors as I don't have experience with those, though I've repeatedly see that latest revisioned models are much better quality.

I see people mentioning the Corsair GPU Power Bridge and the Thermal Grizzly WireView GPU as well, may be good alternatives. Again, I have no experience with those.

The problems still reported with burning connectors I think are increasingly more related to the design of the connector in the RTX4090 itself.
(IMO, should have been two connectors, not one!)


Meanwhile, try undervolting the RTX4090, you get at least a 20% reduction in power consumption, and only a ~2% reduction in performance (very good trade-off).

Various tutorials in youtube. Among plenty others, these two for example:

 

Edited by LucShep

CGTC - Caucasus retexture  |  A-10A cockpit retexture  |  Shadows Reduced Impact  |  DCS 2.5.6 - a lighter alternative 

DCS terrain modules_July23_27pc_ns.pngDCS aircraft modules_July23_27pc_ns.png 

Spoiler

Win10 Pro x64  |  Intel i7 12700K (OC@ 5.1/5.0p + 4.0e)  |  64GB DDR4 (OC@ 3700 CL17 Crucial Ballistix)  |  RTX 3090 24GB EVGA FTW3 Ultra  |  2TB NVMe (MP600 Pro XT) + 500GB SSD (WD Blue) + 3TB HDD (Toshiba P300) + 1TB HDD (WD Blue)  |  Corsair RMX 850W  |  Asus Z690 TUF+ D4  |  TR PA120SE  |  Fractal Meshify-C  |  UAD Volt1 + Sennheiser HD-599SE  |  7x USB 3.0 Hub |  50'' 4K Philips PUS7608 UHD TV + Head Tracking  |  HP Reverb G1 Pro (VR)  |  TM Warthog + Logitech X56 

 

Posted (edited)

I think it bears mentioning here that *if* this is related to the 12VHPWR cable/connector, then just exactly like with the lawsuit, there is an enormous likelihood that any connection issues are not necessarily the fault of the cable or connector itself.  That's what Nvidia has said - and it would also explain how so many (many, many) users don't have any issues at all.

I think it's prudent to consider other factors besides the cable on the MSI PSU (which I have a very hard time believing is actually "bad" in a high-quality unit such as that).  I can assure you it's entirely possible for this sort of issue to *act* like a problem in the cable, while still not actually *being* a problem in the cable itself.  This is not (by far) the first cable that's a pain in the butt to be related to failures, even things like melting connectors - and not always the fault of the cable/connector itself.  I've seen plenty of "IT people" plug cables in wrong, and had to fix plenty of the end results, too.

Here are some other thoughts I've had:

- Have we accounted for the 'approach' of the 12VHPWR cable to the GPU?  In other words, any bending, in either axis?  There is specific guidance about not having bends within a certain distance (I want to say 10mm) of the connector itself, as well as an overall clearance of something like 5mm between the GPU connector and the case sidewall...note I'm not citing specifics, but there are figures and they do matter.  I've seen a lot that are wrong, including here on this forum; bends too tight, too close to the connector, connectors not seated flush and square, cable pressed against case sidewall (causing alignment issues with the GPU connector).  All while the "proud owner" of the 4090 insists he knows how to connect a cable, the picture is showing it's wrong as all hell.

Along these lines, a simple request: Share a pic of the cable as it's connected to the card, without changing anything (if you haven't already).  Ideally you can show when the case side panel is installed as well as when it's off (not sure if you indicated what type case you have, i.e. whether the side panel is transparent).

- Also, that MSI PSU has enough 8pin PCIe connectors, and the 4090 TUF comes with a 4x8>12VHPWR adapter cable (I have the same GPU, and mine did).  Has this been tried?  If the MSI's 12VHPWR cable has an issue, then this test would completely bypass it, and it should work normally.  I would caution, however, that in doing the swap, you have to be careful (of course) to not create a different problem...and also, if this swap works, you can't just say it proves the MSI 12VHPWR cable is bad - you *must* swap the MSI native 12VHPWR back, making sure of good connection, no bends, etc, - and *only then* (if it fails again) could you say it proves the MSI cable is related to the problem.

As a long-time maintenance professional, one of the things that caught my attention is the first GPU was replaced and the second didn't have issues right away - but it did, in time.  To me, this sounds like possibly a connector that's bent and/or under stress (pressed against case side) that only starts to act up over time.  That's the way intermittent contacts often behave; they get worse because contamination builds up where the contacts aren't good, which increases resistance, thus increasing the area of the bad contact, and it snowballs from there.

I think it's possible to jump to a wrong conclusion, and I believe in the necessity to be thorough in order to *prevent* jumping to a wrong conclusion.  I think it's premature to blame the MSI cable, and even if it is a problem, still nothing wrong with 'best practice' and being thorough.

It's also entirely possible that, even if the MSI cable/connector has failed, that it wasn't always bad...it could easily have been crimped, bent wrong, etc and thus failed over time.  That, to me, is a distinct possibility.  And like Nvidia I don't consider that a problem with the cable or the connector.

I wouldn't sue my mechanic or Toyota just because my wife rides the brakes in the car, nor should I.

Edited by kksnowbear
  • Like 1

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted (edited)

Kksnowbear finally reads Wikipedia after a two page ego rant 🤣 

Edited by SharpeXB
  • Like 1

i9-14900KS | ASUS ROG MAXIMUS Z790 HERO | 64GB DDR5 5600MHz | iCUE H150i Liquid CPU Cooler | ASUS TUF GeForce RTX 4090 OC | Windows 11 Home | 2TB Samsung 980 PRO NVMe | Corsair RM1000x | LG 48GQ900-B 4K OLED Monitor | CH Fighterstick | Ch Pro Throttle | CH Pro Pedals | TrackIR 5

Posted (edited)

Dude stop with the stupid personal attacks, seriously.  I've been more  familiar with the entire affair longer than you have, and my position on it hasn't changed.  That Wikipedia crap you posted has nothing (-zero-) to do with what I just wrote, and it's obviously written by someone who is as clueless as you are - which I already proved earlier.

You're not formally trained, and you have zero professional experience. You don't know what you're talking about.  You spend a lot of money paying "PC Vendors" to build stuff (and they robbed you, from the sounds of it) ...all of which doesn't make you knowledgeable about anything (except possibly how to spend a lot of money needlessly). Your incorrect assessment and lack of knowledge could have cost someone their PC.

Edited by kksnowbear

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted
16 minutes ago, kksnowbear said:

Dude stop with the stupid personal attacks, seriously.  I've been more  familiar with the entire affair longer than you have, and my position on it hasn't changed.  That Wikipedia crap you posted has nothing (-zero-) to do with what I just wrote, and it's obviously written by someone who is as clueless as you are - which I already proved earlier.

You're not formally trained, and you have zero professional experience. You don't know what you're talking about.  You spend a lot of money paying "PC Vendors" to build stuff (and they robbed you, from the sounds of it) ...all of which doesn't make you knowledgeable about anything (except possibly how to spend a lot of money needlessly).

 

You gotta see the entertainment value here. You just spent two pages on a diatribe about how the cable problem theory was fake news and then you write this big post about it 😆

  • Like 1

i9-14900KS | ASUS ROG MAXIMUS Z790 HERO | 64GB DDR5 5600MHz | iCUE H150i Liquid CPU Cooler | ASUS TUF GeForce RTX 4090 OC | Windows 11 Home | 2TB Samsung 980 PRO NVMe | Corsair RM1000x | LG 48GQ900-B 4K OLED Monitor | CH Fighterstick | Ch Pro Throttle | CH Pro Pedals | TrackIR 5

Posted

First of all I never used the term fake news.

And I just finished saying that my position hasn't changed at all.  *If* the cable has failed, it's still not proof the cable was bad to begin with. It could easily be because of installation, handling, etc.  Funny thing is, no one here has any way of knowing for sure, without the very information I just asked for.

The fact that you think I'm somehow reversing what I've already said, simply illustrates your total lack of understanding of the circumstances. 

And I wouldn't expect you to understand any of this...you don't have the training, the knowledge, or the experience.  Your initial involvement could've caused someone a disaster, and I'm glad I called it out so it got removed. 

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted
10 minutes ago, kksnowbear said:

First of all I never used the term fake news.

You implied all the info I linked false and then went on to post all the same info yourself. 🙄

i9-14900KS | ASUS ROG MAXIMUS Z790 HERO | 64GB DDR5 5600MHz | iCUE H150i Liquid CPU Cooler | ASUS TUF GeForce RTX 4090 OC | Windows 11 Home | 2TB Samsung 980 PRO NVMe | Corsair RM1000x | LG 48GQ900-B 4K OLED Monitor | CH Fighterstick | Ch Pro Throttle | CH Pro Pedals | TrackIR 5

Posted (edited)

Nope.

The nonsense you claimed the links were saying was wrong...(like the 4090 being the first GPU to use the 12HPWR, as some idiot greenie posted on freakin Wikipedia lmao) and I called that out too, because it was BS.

This lawsuit you cited doesn't prove Nvidia was wrong, nor does PCI SIG changing the connector...which you also tried to claim.

Neither Nvidia nor PCI SIG is responsible if you can't connect a power cable properly.

You don't actually even understand anything in the links you post. 

...and now I'm saying you (still) don't understand what half the stuff you post actually means.  You don't understand these things, because you don't have the training, knowledge, or experience that actually enables understanding them.

You can keep on about it, but it won't change the fact that you still don't understand these things...and you keep proving it, every time you cite some obviously misguided idiot at Wikipedia as some kind of proof.  That guy's wrong, but you don't even know enough about this stuff to know it lol

But hey lol it's all good...you go right ahead, keep posting links to cables that'll melt someone's motherboard if they don't know any better.  Hopefully nobody decides to sue you lmao

Edited by kksnowbear

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted (edited)
1 hour ago, kksnowbear said:

The nonsense you claimed the links were saying was wrong...(like the 4090 being the first GPU to use the 12HPWR, as some idiot greenie posted on freakin Wikipedia lmao) and I called that out too, because it was BS.

It does mention the 3090 Ti.  Or did they fix this cause you called it out to them?

“The connector first appeared in the Nvidia RTX 40 GPUs. The prior Nvidia RTX 30 series introduced a similar, proprietary connector in the "Founder's Edition" cards, which also uses an arrangement of twelve pins for power, but did not have the sense pins, except for the connector on the founders edition RTX 3090 Ti (though not present on the adapter supplied with those cards.)”

Why didn’t this problem present itself with the 3090? My guess is the sheer size of the 4090 made bending and disconnecting the cable more likely. 

1 hour ago, kksnowbear said:

Neither Nvidia nor PCI SIG is responsible if you can't connect a power cable properly.

But then the cable was redesigned to correct it’s flaws. Kinda like admitting guilt… 🤔 Mainly the flaw is that the data pins are long enough that they could remain in contact and the card still gets power when it’s just partially connected. The improved design shortened those so that can’t happen. You just cited a bunch of examples in your previous post about how difficult it is to connect this properly. 

1 hour ago, kksnowbear said:

You don't understand these things, because you don't have the training, knowledge, or experience that actually enables understanding them.

This isn’t brain surgery, it’s a freakin power plug. I can read… 

1 hour ago, kksnowbear said:

That guy's wrong, but you don't even know enough about this stuff to know it lol

You’re trying to discredit the whole issue over a small error like the 3090? That doesn’t change any of the facts about the cable.

Funny here you go reversing yourself after just trying to call attention to the issue in your previous post 🤣 Is this a potential problem or isn’t it?

Edited by SharpeXB

i9-14900KS | ASUS ROG MAXIMUS Z790 HERO | 64GB DDR5 5600MHz | iCUE H150i Liquid CPU Cooler | ASUS TUF GeForce RTX 4090 OC | Windows 11 Home | 2TB Samsung 980 PRO NVMe | Corsair RM1000x | LG 48GQ900-B 4K OLED Monitor | CH Fighterstick | Ch Pro Throttle | CH Pro Pedals | TrackIR 5

Posted

As i said, it's all good...you go right ahead, keep posting links to cables that'll melt someone's motherboard if they don't know any better.  Hopefully nobody decides to sue you lmao

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

Posted (edited)
On 8/19/2024 at 7:15 PM, LucShep said:

@nephilimborn Looks like power delivery issue, either by PSU or GPU cable connector, or the cable itself (?).

Can't pronounce about CableMod 12VHPWR connectors as I don't have experience with those, though I've repeatedly see that latest revisioned models are much better quality.

I see people mentioning the Corsair GPU Power Bridge and the Thermal Grizzly WireView GPU as well, may be good alternatives. Again, I have no experience with those.

The problems still reported with burning connectors I think are increasingly more related to the design of the connector in the RTX4090 itself.
(IMO, should have been two connectors, not one!)


Meanwhile, try undervolting the RTX4090, you get at least a 20% reduction in power consumption, and only a ~2% reduction in performance (very good trade-off).

Various tutorials in youtube. Among plenty others, these two for example:

 

 

Thanks LucShep, ordered the wireview. 

Also check 4 different power supply calculators, the majority of which seem to indicate that i am at the edge or above 80% power draw for the MSI 1000G PSU. I might just also pull the trigger on a new PSU; considering the XPG CORE REACTOR II 1200W which, according to Hardware Busters has very good 12Volt rail properties https://hwbusters.com/psus/xpg-core-reactor-ii-1200w-psu-review/ and is assessed as top tier on the cultist psu list https://cultists.network/140/psu-tier-list/

I tried undervolting with GPU Tweak III and got mixed black screen behaviour when running the Unigine Superposition stability test - i ended up just ignorantly setting the power target to 90% for now. Will put more effort into tuning the VF curve when the thermal grizzly wireview arrives.

This sh*tshow is driving me into mental instability - replacing the PSU is driven more by desperation rather than any evidence that the MSI PSU is the root cause.

Edited by nephilimborn

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted (edited)
7 hours ago, kksnowbear said:

I think it bears mentioning here that *if* this is related to the 12VHPWR cable/connector, then just exactly like with the lawsuit, there is an enormous likelihood that any connection issues are not necessarily the fault of the cable or connector itself.  That's what Nvidia has said - and it would also explain how so many (many, many) users don't have any issues at all.

I think it's prudent to consider other factors besides the cable on the MSI PSU (which I have a very hard time believing is actually "bad" in a high-quality unit such as that).  I can assure you it's entirely possible for this sort of issue to *act* like a problem in the cable, while still not actually *being* a problem in the cable itself.  This is not (by far) the first cable that's a pain in the butt to be related to failures, even things like melting connectors - and not always the fault of the cable/connector itself.  I've seen plenty of "IT people" plug cables in wrong, and had to fix plenty of the end results, too.

Here are some other thoughts I've had:

- Have we accounted for the 'approach' of the 12VHPWR cable to the GPU?  In other words, any bending, in either axis?  There is specific guidance about not having bends within a certain distance (I want to say 10mm) of the connector itself, as well as an overall clearance of something like 5mm between the GPU connector and the case sidewall...note I'm not citing specifics, but there are figures and they do matter.  I've seen a lot that are wrong, including here on this forum; bends too tight, too close to the connector, connectors not seated flush and square, cable pressed against case sidewall (causing alignment issues with the GPU connector).  All while the "proud owner" of the 4090 insists he knows how to connect a cable, the picture is showing it's wrong as all hell.

Along these lines, a simple request: Share a pic of the cable as it's connected to the card, without changing anything (if you haven't already).  Ideally you can show when the case side panel is installed as well as when it's off (not sure if you indicated what type case you have, i.e. whether the side panel is transparent).

- Also, that MSI PSU has enough 8pin PCIe connectors, and the 4090 TUF comes with a 4x8>12VHPWR adapter cable (I have the same GPU, and mine did).  Has this been tried?  If the MSI's 12VHPWR cable has an issue, then this test would completely bypass it, and it should work normally.  I would caution, however, that in doing the swap, you have to be careful (of course) to not create a different problem...and also, if this swap works, you can't just say it proves the MSI 12VHPWR cable is bad - you *must* swap the MSI native 12VHPWR back, making sure of good connection, no bends, etc, - and *only then* (if it fails again) could you say it proves the MSI cable is bad.

As a long-time maintenance professional, one of the things that caught my attention is the first GPU was replaced and the second didn't have issues right away - but it did, in time.  To me, this sounds like possibly a connector that's bent and/or under stress (pressed against case side) that only starts to act up over time.  That's the way intermittent contacts often behave; they get worse because contamination builds up where the contacts aren't good, which increases resistance, thus increasing the area of the bad contact, and it snowballs from there.

I think it's possible to jump to a wrong conclusion, and I believe in the necessity to be thorough in order to *prevent* jumping to a wrong conclusion.  I think it's premature to blame the MSI cable, and even if it is a problem, still nothing wrong with 'best practice' and being thorough.

It's also entirely possible that, even if the MSI cable/connector has failed, that it wasn't always bad...it could easily have been crimped, bent wrong, etc and thus failed over time.  That, to me, is a distinct possibility.  And like Nvidia I don't consider that a problem with the cable or the connector.

I wouldn't sue my mechanic or Toyota just because my wife rides the brakes in the car, nor should I.

 

Thank kksnowbear, very good point wrt the cable routing as a possible root cause. It does have bend, that while not extreme (20mm), it does not meet the recommended (35mm) length b4 the bend - this is a result of the MSI cable length not being ideal with my build. Maybe ignorantly i trusted that ScorpTec pc builders know what they doing when they built the system.

I ordered the thermal grizzly wireview to hopefully mitigate this, i trust der8auer more than i do cablemod - hopefully it arrives by tomorrow.

i am so deep in the trees that i cannot see the forest; thank you for the analysis and a great idea to try the 8pin PCIE adapters, i should have thought of that! Now i just need to find where the hell i put the adapter!

Also agree re the connector which is PCI-SIG specification; it was adopted by Nvidia & PSU vendors, not invented by them. I bear responsibility for just blindly trusting how the pc builder installed & routed the PCIE5 cable.

Attached are two photos of the cable & GPU - will get the side panel off later and capture that as well.

Thanks Again!

20240821_152802.jpg

20240821_152810.jpg

Edited by nephilimborn

i7-13700KF; RTX-4090; 64GB RAM; Quest3 & PimaxCL; Virpil CM3 + VKB Gunfighter Mk.IV MCE-Ultimate + VKB T-Rudder Mk.V

Posted (edited)
6 hours ago, nephilimborn said:

Thank kksnowbear, very good point wrt the cable routing as a possible root cause. It does have bend, that while not extreme (20mm), it does not meet the recommended (35mm) length b4 the bend - this is a result of the MSI cable length not being ideal with my build. Maybe ignorantly i trusted that ScorpTec pc builders know what they doing when they built the system.

I ordered the thermal grizzly wireview to hopefully mitigate this, i trust der8auer more than i do cablemod - hopefully it arrives by tomorrow.

i am so deep in the trees that i cannot see the forest; thank you for the analysis and a great idea to try the 8pin PCIE adapters, i should have thought of that! Now i just need to find where the hell i put the adapter!

Also agree re the connector which is PCI-SIG specification; it was adopted by Nvidia & PSU vendors, not invented by them. I bear responsibility for just blindly trusting how the pc builder installed & routed the PCIE5 cable.

Attached are two photos of the cable & GPU - will get the side panel off later and capture that as well.

Thanks Again!

20240821_152802.jpg

20240821_152810.jpg

 

Hi again @nephilimborn I am grateful you shared the pics and are also willing to invest the time in actually understanding what could well be going on.

Yeah, that bend seems a little tight for me; maybe not terrible but as you said, not within the guidance (thanks BTW for the correct figures, lol I was too lazy to go find them).  I am very relieved you actually understand the 'root cause' factor at issue here.  It always makes things so much easier to work with someone who's actually capable in that regard.

I was concerned about it because it is among the issues that cause the connector failures.  I do sincerely appreciate your acknowledgement in this respect.  I am flatly amazed at the number of people who will file lawsuits etc, rather than take responsibility for their own/their builder's work.  As I've said (and as Nvidia will surely say, if pressed in a lawsuit) there would have to be a reason that so many people have used the exact same connector and *don't* have problems (and if I understood correctly, we're talking into 6 figures).

Like I said I wouldn't really have any reason to blame my mechanic or Toyota because my wife rides the brakes (and believe me, those maintenance guys appreciate that I understand what's taking place! LOL)

Someone reading this might ask why the distinction matters so much to people like me.  If I can take the liberty, the reason is that if one assumes it's just a 'bad cable' and simply replaces the cable in the same fashion...then given the same heat, time and mechanical stress...(as I bet you already know) the replacement cable will also fail, possibly quicker than the one that was replaced.  The 'repair' didn't really correct the problem, because the user failed to recognize the cable issue was a symptom of the problem, rather than the problem itself.

That's why the 'root cause' matters so much (for the benefit of the reader who actually wants to learn) - and I can't tell you how glad I am that you seem to actually understand this.

BTW the example I just made actually assume the cable is even failed...the problem could easily be that mechanical strain is causing poor mating of the contacts - meaning the cable isn't even bad at all, just needs to have the bends remedied.  I'd honestly prefer to think that would be the case, as opposed to that nice MSI PSU actually having a "bad" cable.  Again - actually understanding 'root cause' matters.   A lot.

Hopefully the stuff you've ordered will help overcome the root cause and thus correct your issues once and for all.  I'm glad if my input helped.

Now if you'll excuse me, my mechanic is calling...says the wife's car needs brakes (again? lol)

Edited by kksnowbear
  • Like 1

Free professional advice: Do not rely upon any advice concerning computers from anyone who uses the terms "beast" or "rocking" to refer to computer hardware.  Just...don't.  You've been warned.

While we're at it, people should stop using the term "uplift" to convey "increase".  This is a technical endeavor, we're not in church or at the movies - and it's science, not drama.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...