There’s been some additional reporting on the RTX GPU instabilities in user forums and Reddit. Before we dig into the findings, I need to stress that Nvidia has not released a statement on this topic. Individual user investigations are intrinsically more prone to error compared with a formal company effort and this situation is no different.
Reports now suggest that some RTX 3080s that are unstable with normal boost behavior enabled become much more stable if run in maximum p-state. The same Beyond3D poster notes that changing his power supply from multi-rail to single-rail also had an impact on system stability. From the way the post is written, it is impossible to tell if all three methods of improving stability — single-rail, lower boost clocks, and locked maximum p-state — work equally well, or if he tested them in conjunction with each other.
I have 3080 FE I have been experimenting with. The card was very unstable until I switched my power supply (Corsair HX1000) from multi-rail 12V to single-rail 12V, so the very high short term power draw seems to be an issue. Max boost at stock looks like it is 2100 mhz (as reported by nvidia-smi), but I have never seen over 2055 reported. Capping max boost at below 2000 mhz seems to be stable. Also, locking the card at max P state seems to be stable (using nvidia-smi -lgc 2100). Idle power draw (TBP) goes from about 28 watts to 59 watts when locking the max P state.
Nvidia’s New Drivers May Lower Boost Clocks, Improve Stability
Nvidia’s new 456.55 driver claims that it “also improves stability in certain games on RTX 30 Series GPUs.” This could be a tacit admission that Nvidia has a stability problem it’s trying to fix on the sly, or it could mean that this is an entirely normal GPU launch in which a company pushed out a fast driver update to fix some last-minute stability issues. This is scarcely uncommon with new GPU architectures.
Some users report seeing lower GPU clocks now, with the average decrease in the 90-100MHz range, but there are also users reporting no change and a handful claiming significant performance improvements. Stability appears to have genuinely improved across the board. If you own an RTX 3080 I’d definitely recommend testing the new drivers. One owner of an MSI Gaming X Trio posted his voltage and frequency response curves before and after the new driver, and while there are some differences, they aren’t particularly large. This goes back to the observed behavior likely being card, game, and manufacturer-specific to some degree.
This is just one subtype of card from one company, measured by one user, so again, take these results with a grain of salt.
Does This Demonstrate the Problem Is Software-Only?
No. We don’t have an answer to that question yet. But given that some companies like MSI are already changing the stock imagery for their GPUs, it certainly seems as though some kind of hardware revision is likely underway.
Here is what we know: The latest Nvidia driver appears to significantly improve RTX 3080 stability. While it has lowered clock speeds for some customers, there are no reports of lower performance. In fact, some customers have reported better performance since the updates, either because they were able to run their GPUs at much higher clocks (one user had to pull all the way down to 1.7GHz to get the card to stabilize), or because the GPU gains more performance from not constantly changing its clock than it does from bursting up to high clocks on a regular basis.
Several years ago, AMD launched the RTX 480, only to discover that some of its cards pulled more power from the 75W PCIe lane than was strictly in spec. The company was able to fix the problem with a driver update that actually improved RX 480 performance by a small degree in several games.
It is possible that Nvidia launched this driver to make certain all RTX 3080 cards would perform up to spec, but that it simultaneously wants its board partners to revise their hardware to fix a hardware errata that slipped past validation. The simplest way to check this will be to look at later model RTX 3080s (and by later, I mean “produced a few weeks from now”) as compared with current cards. If Version 1.0 from Vendor A uses POSCAPs and Version 1.1 from Vendor A in three weeks has an entirely different set of power rails with MLCCs, it’ll be pretty good evidence that manufacturers made some very fast hardware changes to their underlying board designs. It may be that Nvidia’s driver changes will fix whatever hardware issue is in-play here, allowing launch-day RTX 3080s to run at equal speeds with later cards.
Finally, this could turn out to be a band-aid patch slapped on a fomenting problem with the potential to really wreck Nvidia’s day. For now, this seems the least likely explanation. If Nvidia knew it had a problem it thought it couldn’t fix, waiting to disclose it only makes the impact worse. With that said, we hope the company does issue a statement that both explains the behavior end-users have witnessed and also clarifies what the new driver does to solve it.
RTX 3080 owners are reporting much-improved stability with the 456.55 driver, so we recommend grabbing it as soon as you can. If you have an RTX 3080, drop us a line and let us know how it improves things for you (or doesn’t).
Leave a Reply