With the Xbox Series X and PlayStation 5 both coming to market at some point between now and the heat death of the universe, it’s a decent moment to revisit the strengths and weaknesses of using TFLOPS to measure GPU performance between two products.
The first thing to understand is that there is no single metric that can accurately capture performance between two GPUs, unless that single measurement happens to capture the only workload you care about. MHz, FLOPs, Elephants per square meter of hydrochloric acid — all of them have weaknesses when used to measure performance, and one of them is a major violation of the Endangered Species Act.
FLOPS has one significant advantage over a metric like MHz, in that it has a theoretically direct relationship to the amount of work being performed per second. Clock speed is typically used to imply that higher MHz = faster performance. With FLOPS, a higher FLOPS rating is supposed to mean higher performance.
Does it? Sometimes. That’s the tricky part. The good news is, the Xbox Series X and PlayStation 5 are a little easier to compare on this score than a typical AMD-versus-Nvidia-versus-Intel battle royale.
What FLOPS Tells You (and What It Doesn’t)
FLOPS is a measure of Floating Point Operations per second. FLOPS can be measured at varying levels of precision, including 16-bit (half precision), 32-bit (single precision) and 64-bit (double precision). In gaming, single precision is what you care about. To calculate FLOPS, you would multiply the number of cores * clock speed * FLOPS/cycle.
This calculation metric also highlights the weakness of FLOPS as a gaming performance metric — it only measures the mathematical throughput of a GPU’s cores, not the capabilities of any other part of the card. Other factors, like pixel fill rate (how many pixels the GPU can write to screen per second) and texture fill rate (how many texture elements can the GPU map to pixels per second) both have a significant impact on absolute GPU performance.
If you want proof of the dangers of relying on FLOPS as a performance metric, consider the Radeon VII. Our benchmark results from our review back in 2019 are available below. Compare the Radeon VII with the Vega 64, specifically:
The Vega 64 is capable of 12.67 TFLOPS with a 4096:256:64 configuration. The Radeon VII is capable of up to 14.2 TFLOPS according to AMD in a 3840:240:64 configuration. On paper, that’s a 1.12x increase in TFLOPS performance. Given that real-world improvements are almost always smaller than theoretical gains, you’d expect the Radeon VII to be 1.07x – 1.10x faster than the Vega 64 if FLOPS were the distinguishing factor between the two. Both GPUs are based on the same AMD graphics architecture (GCN).
The actual real-world improvement is 1.33x. In this case, factoring in the additional details I provided about GPU configurations wouldn’t account for the performance difference, either. The Radeon VII has exactly the same number of ROPS, 94 percent the number of texture units, 94 percent the GPU core count, and a ~1.12x increase to base and boost clock speed. Again, the basic math favors a much smaller boost.
Keep in mind, this is a best-case comparison for FLOPS. The Radeon VII and Vega 64 are based on the same architecture and have nearly the same core count and feature distribution.
Why FLOPS Fails
The reason FLOPS and even FLOPS + clock fails to capture the actual performance improvement from Vega 64 to Radeon VII is that it leaves out the Radeon VII’s dramatically increased memory bandwidth, improved low-level register latencies, and ability to sustain higher clocks for longer periods of time. As this Anandtech review shows, even when compared at a static 1.5GHz, there are cases where the Vega 64 and Radeon VII are on top of each other, and places where Radeon VII is 16 percent faster.
FLOPS and FLOPS + clock both fail to capture this kind of specificity because they aren’t granular enough. But depending on the workload you care about, a 1.16x improvement at the same clock speed for Radeon VII would be a huge gain.
Why It’s Hard to Judge the PlayStation 5 vs. the Xbox Series X
On paper, the Xbox Series X GPU should be significantly more powerful than the PlayStation 5; Microsoft is fielding 52 CUs with 3,328 cores and a 1.825GHz core clock, while Sony is using 2,304 GPU cores at “up to” 2.25GHz core clock. According to Mark Cerny, the PlayStation 5’s smaller GPU is more efficient than the wider, slower, core on the Xbox Series X, but this is an unusual claim we haven’t seen supported in testing in the past. We discussed the nature of this argument in more detail with Oxide Games lead Graphics Architect, Dan Baker, in a recent article, so I won’t go back over the discussion here.
If I was going to pick a single reason why the Xbox Series X might have less of a performance advantage against the PS5 in real-life than it does on paper, it would be the impact of Microsoft’s split memory bandwidth and slower SSD cache. Microsoft seems to favor odd approaches to memory — the company went with a 32MB cache and slower DDR3 RAM with the original Xbox One before moving to a unified GDDR5 memory model with the Xbox One X.
But if there’s room for the PlayStation 5 to be closer on the Xbox’s heels than anticipated, there’s also room for the opposite — the Xbox Series X could open a wider margin against the PlayStation 5 than one might expect. If this were to happen, it would be because of subtle optimizations and improvements Microsoft implemented in its design that Sony didn’t copy. Even when both companies use the same GPU architecture there can be differences in design; the original PlayStation 4 had more asynchronous compute units (ACEs) than the Xbox One (8 versus 2).
Right now, it looks as if the Xbox Series X brings more graphics firepower to the table than the PlayStation 5. There are a number of additional factors that could still influence how the two consoles compare with each other, including aspects that have nothing to do with hardware, like whether Microsoft or Sony offers better dev tools and support. But as to the value of FLOPS as a metric for comparing console performance? Even in the best case, it isn’t great.
Leave a Reply