Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000
by Dr. Ian Cutress on December 3, 2020 10:00 AM EST- Posted in
- CPUs
- AMD
- Zen 3
- X570
- Ryzen 5000
- Ryzen 9 5950X
- SMT
- Multi-Threading
Gaming Performance (Discrete GPU)
For our gaming tests, we are using our AMD Ryzen 9 5950X paired with an NVIDIA RTX 2080 Ti graphics card. Our standard test suite consists of 12 titles, tested at four configurations:
- Stage 1: Actual Gaming (1080p Maximum Quality, or equivalent)
- Stage 2: All About Pixels (‘4K Minimum’ Quality)
- Stage 3: Medium Low (‘1440p Minimum’)
- Stage 4: Lowest Lows (720p Minimum or lower)
The final three settings are a set of CPU-limited gaming, and help find the limit of where we move from CPU limited to GPU limited. Some users baulk at this testing finding it irrelevant, however these configurations have been widely requested over the years. The contraire to this testing is the first setting, at 1080p Maximum: this being requested given that 1080p is the most popular gaming resolution, and Maximum Quality because this graphics card should be able to handle almost everything at that resolution at very playable framerates.
All the details for our gaming tests can be found in our #CPUOverload article.
Stage 1: Actual Gaming AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 1080p Max | 100% | - |
Civilization 6 | 1080p Max | 103% | - |
Deus Ex: MD | 1080p Max | 99% | 100% |
Final Fantasy 14 | 1080p Max | 102% | - |
Final Fantasy 15 | 8K Standard | 100% | 99% |
World of Tanks | 1080p Max | 100% | 102% |
World of Tanks | 4K Max | 103% | 102% |
Borderlands 3 | 1080p Max | 101% | 103% |
F1 2019 | 1080p Ultra | 103% | 106% |
Far Cry 5 | 1080p Ultra | 104% | 104% |
GTA V | 1080p Max | 99% | 100% |
RDR 2 | 1080p Max | 100% | 100% |
Strange Brigate | 1080p Ultra | 101% | 101% |
In real-world gaming situations, there’s very little to pick between having SMT enabled or disabled. Almost universally it is either beneficial or a smidgen better to have it enabled, with F1 2019, Civilization 6, and Far Cry 5 seemingly the best recipients. I’ve also added in the Stage 3 result from World of Tanks, just because that benchmark doesn’t really have a proper settings menu.
Stage 2: All About Pixels AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 4K Low | 99% | - |
Civilization 6 | 4K Min | 105% | - |
Deus Ex: MD | 4K Min | 98% | 100% |
Final Fantasy 14 | 4K Min | 102% | - |
Final Fantasy 15 | 4K Standard | 100% | 100% |
Borderlands 3 | 4K Very Low | 101% | 104% |
F1 2019 | 4K Ultra Low | 100% | 100% |
Far Cry 5 | 4K Low | 101% | 100% |
GTA V | 4K Low | 100% | 101% |
RDR 2 | 8K Min | 100% | 100% |
Strange Brigate | 4K Low | 100% | 100% |
With our high resolution settings with minimal quality, there is only one outlier in Civilization 6 on the average frame rates, which seem to be a bit higher when SMT is enabled.
Stage 3: Medium Low AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 1440p Low | 100% | - |
Civilization 6 | 1440p Min | 105% | - |
Deus Ex: MD | 1440p Min | 97% | 96% |
Final Fantasy 14 | 1440p Min | 102% | - |
Final Fantasy 15 | 1080p Standard | 101% | 105% |
World of Tanks | 1080p Standard | 101% | 101% |
Borderlands 3 | 1440p Very Low | 103% | 105% |
F1 2019 | 1440p Ultra Low | 99% | 99% |
Far Cry 5 | 1440p Low | 99% | 99% |
GTA V | 1440p Low | 100% | 99% |
RDR 2 | 1440p Low | 100% | 100% |
Strange Brigate | 1440p Low | 100% | 100% |
At the more medium settings, we’re starting to see some more variation (Borderlands gets a few percent from SMT). We’re starting to see Deus Ex:MD drop off a bit with SMT enabled.
Stage 4: Lowest Lows AMD Ryzen 9 5950X, SMT On vs SMT Off |
|||
AnandTech | Settings | Average FPS |
95th Percentile |
Chernobylite | 360p Low | 106% | - |
Civilization 6 | 480p Min | 102% | - |
Deus Ex: MD | 600p Min | 91% | 91% |
Final Fantasy 14 | 768p Min | 102% | - |
Final Fantasy 15 | 720p Standard | 99% | 102% |
World of Tanks | 768p Min | 101% | 100% |
Borderlands 3 | 360p Very Low | 108% | 110% |
F1 2019 | 768p Ultra Low | 102% | 105% |
Far Cry 5 | 720p Low | 100% | 101% |
GTA V | 720p Low | 99% | 98% |
RDR 2 | 384p Low | 100% | 103% |
Strange Brigate | 720p Low | 95% | 95% |
This is perhaps our most varied set of results, with Deus Ex:MD showing an almost 10% drop with SMT enabled. DEMD is usually considered a CPU title, but so is Chernobylite, which sees a 6% gain. Borderlands is +8-10% with SMT enabled, which is more of a modern game. However, I doubt anyone is playing at these resolutions.
Overall Gaming Performance
If we take full averages from all the data points, then we’re seeing a rough +1% gain in performance in the more complex scenarios across the board.
Resolution Average Comparison AMD Ryzen 9 5950X, SMT On vs SMT Off |
||||
AnandTech | Setting | aka | Average FPS |
95th Percentile |
Stage 1 | 1080p Max | Actual Gaming | 101% | 101% |
Stage 2 | 4K+ Min | All About Pixels | 101% | 101% |
Stage 3 | 1440p Min | Medium Lows | 101% | 101% |
Stage 4 | < 768p Min | Lowest Lows | 100% | 101% |
In reality, any loss or gain is highly dependent on the title in question, and can swing from one side of the line to the other. It’s clear that Deus Ex prefers SMT off, and F1 2019 or Borderlands prefers SMT on, but we are talking fine margins here.
126 Comments
View All Comments
Oxford Guy - Friday, December 4, 2020 - link
Suggestions:Compare with Zen 2 and Zen 1, particularly in games.
Explain SMT vs. CMT. Also, is SMT + CMT possible?
AntonErtl - Sunday, December 6, 2020 - link
CMT has at least two meanings.Sun's UltraSparc T1 has in-order cores that run several threads alternatingly on the functional units. This is probably the closest thing to SMT that makes sense on an in-order core. Combining this with SMT proper makes no sense; if you can execute instructions from different threads in the same cycle, there is no need for an additional mechanism for processing them in alternate cycles. Instruction fetch on some SMT cores processes instructions in alternate cycles, though.
The AMD Bulldozer and family have pairs of cores that share more than cores in other designs share (but less than with SMT): They share the I-cache, front end and FPU. As a result, running code on both cores of a pair is often not as fast as when running it on two cores of different pairs. You can combine this scheme with SMT, but given that it was not such a shining success, I doubt anybody is going to do it.
Looking at roughly contemporary CPUs (Athlon X4 845 3.5GHz Excavator and Core i7 6700K 4.2Ghz Skylake), when running the same application twice one after the other on the same core/thread vs. running it on two cores of the same pair or two threads of the same core, using two cores was faster by a factor 1.65 on the Excavator (so IMO calling them cores is justified), and using two threads was faster by a factor 1.11 on the Skylake. But Skylake was faster by a factor 1.28 with two threads than Excavator with two cores, and by a factor 1.9 when running only a single core/thread, so even on multi-threaded workloads a 4c/8t Skylake can beat an 8c Excavator (but AFAIK Excavators were not built in 8c configurations). The benchmark was running LaTeX.
Oxford Guy - Sunday, December 6, 2020 - link
AMD's design was very inefficient in large part because the company didn't invest much into improving it. The decision was made, for instance, to stall high-performance with Piledriver in favor of a very very long wait for Zen. Excavator was made on a low-quality process and was designed to be cheap to make.Comparing a 2011/2012 design that was bad when it came out with Skylake is a bit of a stretch, in terms of what the basic architectural philosophy is capable of.
I couldn't remember that fourth type (the first being standard multi-die CPU multiprocessing) so thanks for mentioning it (Sun's).
USGroup1 - Saturday, December 5, 2020 - link
So yCruncher is far away from real world use cases and 3DPMavx isn't.pc8086 - Sunday, December 6, 2020 - link
Many congratulations to Dr. Ian Cutress for the excellent analysis carried out.If possible, it would be extremely interesting to repeat a similar rigorous analysis (at least on multi-threaded subsection of choosen benchmarks) on the following platforms:
- 5900X (Zen 3, but fewer cores for each chiplet, maybe with more thermal headroom)
- 5800X (Zen 3, only a single computational chiplet, so no inter CCX latency throubles)
- 3950X (same cores and configuration, but with Zen 2, to check if the new, beefier core improved SMT support)
- 2950X (Threadripper 2, same number of cores but Zen+, with 4 mamory channels; useful expecially for tests such as AIBench, which have gotten worse with SMT)
- 3960X (Threadripper3, more cores, but Zen2 and with 4 memory ch.)
Obviously, it would be interesting to check Intel HyperThreading impact on recent Comet Lake, Tiger Lake and Cascade Lake-X.
For the time being, Apple has decided not to use any form of SMT on its own CPUs, so it is useful to fully understand the usefulness of SMT technologies for notebooks, high-end PCs and prosumer platforms.
Than you very much.
eastcoast_pete - Sunday, December 6, 2020 - link
Thanks Ian! With some of your comments about memory access limiting performance in some cases, how does (or would) a quad channel memory setup give in additional performance compared to the dual channel consumer setups (like these or mine) have? Now, I know that servers and actual workstations usually have 4 or more memory channels, and for good reason. So, in the time of 12 and 16 core CPUs, is it time for quad channel memory access for the rest of us, or would that break the bank?mapesdhs - Thursday, December 10, 2020 - link
That's a good question. As time moves on and we keep getting more cores, with people doing more things that make use of them (such as gaming and streaming at the same time, with browser/tabs open, livechat, perhaps an ecode too), perhaps indeed the plethora of cores does need better mem bw and parallelism, but maybe the end user would not yet tolerate the cost.Something I noticed about certain dual-socket S2011 mbds on Aliexpress is that they don't have as many memory channels as they claim, which with two CPUs does hurt performance of even consumer grade tasks such as video encoding:
http://www.sgidepot.co.uk/misc/kllisre_analysis.tx...
bez5dva - Monday, December 7, 2020 - link
Hi Dr. Cutress!Thanks for these interesting tests!
Perhaps, SMT thing is a something that could drastically improve more budget CPUs performance? Your CPU has more than enough shiny cores for these games, but what if you take Ryzen 3100? I believe %age would be different, as it was in my real world case :)
Back then i had 6600k@4500 and in some FPS games with a huge maps and a lot of players (Heroes and Generals; Planetside 2) i started to receive stutters in a tight fights, but when i switched to 6700@4500 it wasn't my case anymore. So i do believe that Hyperthreading worked in my case, cuz my CPUs were identical aside of virtual threads in the last one.
Would super interesting to have this post updated with a cheaper sample results 😇
peevee - Monday, December 7, 2020 - link
It is clear that 16-core Ryzen is power, memory and thermally limited. I bet SMT results on 8-core Ryzen 7 5800x would be much better for more loads.naive dev - Tuesday, December 8, 2020 - link
The slide states that Zen 3 decodes 4 instructions/cycle. Are there two independent decoders which each decode those 4 instruction for a thread? Or is there a single decoder that switches between the program counters of both threads but only decodes instructions of one thread per cycle?