There isn't an RDNA5 on the roadmap, though. It's been confirmed 4 is the last (and was really meant to be 3.5, but grew into what is assumed to be the PS5/XSX mid-gen refresh architecture).
Next is UDNA1, a converged architecture with it's older sibling, CDNA (formerly GCN).
Like, the article actually states this, but runs an RDNA 5 headline anyways.
And their half step/semi-custom work can find their way back to APUs. RDNA 3.5 (the version marketed as such) is in the Zen 5 APUs with Mobile oriented improvements. It wouldn’t surprise me if a future APU gets RDNA 5. GCN had this sort of APU/Console relationship as well.
UDNA isn't a name but instead a big shift in strategy.
CDNA was for HPC / Supercomputers and Data center. GCN always was a better architecture than RDNA for that.
RDNA itself was trying to be more NVidia like. Fewer FLOPs but better latency.
Someone is getting the axe. Only one of these architectures will win out in the long run, and the teams will also converge allowing AMD to consolidate engineers to improving the same architecture.
We won't know what the consolidated team will release yet. But it's a big organizational shift that surely will affect AMDs architectural decisions.
My understanding was that CDNA and RDNA shared much if not most of their underlying architecture, and that the fundamental differences had more to do with CDNA supporting a greater variety of numeric representations to aid in scientific computing. Whereas RDNA really only needed fp32 for games.
I've been showing this one to people for a few years as a good introduction on how RDNA diverged from GCN->CDNA.
The main thing they did was change where wavefront steps (essentially, quasi-VLIW packets) execute: instead of being at the head of the pipeline (which owns 4x SIMD16 ALUs = 64 items) and requires executing 64 threads concurrently (thus, 64x registers/LDS/etc space), it issues non-blocking segments of the packet into per-ALU sub-pipelines, requiring far fewer concurrent threads to maintain peak performance (and, in many cases, far less concurrent registers used for intermediates that don't leave the packet).
GCN is optimized for low instruction parallelism but high parallelism workloads. Nvidia since the dawn of their current architecture family tree has been optimized for high instruction parallelism but not simple highly parallel workloads. RDNA is optimized to handle both GCN-optimal and NVidia-optimal cases.
RDNA, since this document has been written, also has been removing all the roadblocks to improve performance on this fundamental difference. RDNA4, the one that just came out, increased the packet processing queue to be able to schedule more packets in parallel and more segments of the packets into their per-ALU slots, is probably the most influential change: in software that performed bad on all GPUs (GCN, previous RDNA, anything Nvidia), a 9070XT can perform like a 7900XTX with 2/3rds the watts and 2/3rds the dollars.
While CDNA has been blow for blow against Nvidia's offerings since it's name change, RDNA has eradicated the gap in gaming performance. Nvidia functionally doesn't have a desktop product below a 5090 now, and early series 60 rumors aren't spicy enough to make me think Nvidia has an answer in the future, either.
CDNA is 64 wide per work item. And CDNA1 I believe was even 16 lanes executed over 4 clock ticks repeatedly (ie: minimum latency of all operations, even add or xor, was 4 clock ticks). It looks like CDNA3 might not do that anymore but that's still a lot of differences...
RDNA actually executes 32-at-a-time and per clock tick. It's a grossly different architecture.
That doesn't even get to Infinity Cache, 64-bit support, AI instructions, Raytracing, or any of the other differences....
A reason was backwards compatibility, studios were already putting lots of money into PS4 and XBox One, thus PS5 and XBox X|S (two additional SKUs), were already too much.
Don't forget one reason that studios tend to favour consoles has been regular hardware, and that is no longer the case.
When middleware starts to be the option, it is relatively hard to have game features that are hardware specific.
Less effort going into optimization also plays a factor. On average games are a lot less optimized than they used to be. The expectation seems to be that hardware advances will fix deficiencies in performance.
This doesn’t affect me too much since my backlog is long and by the time I play games, they’re old enough that current hardware trivializes them, but it’s disappointing nonetheless. It almost makes me wish for a good decade or so of performance stagnation to curb this behavior. Graphical fidelity is well past the point of diminishing returns at this point anyway.
PS1: 0.03 GFLOPS (approx given it didn't really do FLOPS per se)
PS3: 230 GFLOPS
Nearly 1000x faster.
Now compare PS4 with PS5 pro (also just over 10 years apart):
PS4: ~2TFLOPS
PS5 Pro: ~33.5TFLOPS
Bit over 10x faster. So the speed of improvement has fallen dramatically.
Arguably you could say the real drop in optimization happened in that PS1 -> PS3 era - everything went from hand optimized assembly code to running (generally) higher level languages and using abstrated graphics frameworks like DirectX and OpenGL. Just noone noticed because we had 1000x the compute to make up for it :)
Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
It is not just GPU performance, it is that visually things are already very refined. A ten times leap in performance doesn't really show as ten times the visual spectical like it used to.
Like all this path tracing/ray tracing stuff, yes it is very cool and can add to a scene but most people can barely tell it is there unless you show it side by side. And that takes a lot of compute to do.
Yeah there’s been a drop off for sure. Clearly it hasn’t been steep enough for game studios to not lean on anyway, though.
One potential forcing factor may be the rise of iGPUs, which have become powerful enough to play many titles well while remaining dramatically more affordable than their discrete counterparts (and sometimes not carrying crippling VRAM limits to boot), as well as the growing sector of PC handhelds like the Steam Deck. It’s not difficult to imagine that iGPUs will come to dominate the PC gaming sphere, and if that happens it’ll be financial suicide to not make sure your game plays reasonably well on such hardware.
I get the perhaps mistaken impression the biggest problem games developers have is making & managing absolutely enormous amounts of art assets at high resolution (textures, models, etc). Each time you increase resolution from 576p, to 720p to 1080p and now 4k+ you need a huge step up in visual fidelity of all your assets, otherwise it looks poor.
And given most of these assets are human made (well, until very recently) this requires more and more artists. So I wonder if games studios are more just art studios with a bit of programming bolted on, vs before with lower res graphics where you maybe had one artist for 10 programmers, now it is more flipped the other way. I feel that at some point over the past ~decade we hit a "organisational" wall with this and very very few studios can successfully manage teams of hundreds (thousands?) of artists effectively?
This hits the nail pretty close to the head. I work on an in-house AAA engine used by a number of different games. It's very expensive to produce art assets at the quality expected now.
Many AAA engine's number one focus isn't "performance at all costs", it's "how do we most efficiently let artists build their vision". And efficiency isn't runtime performance, efficiency is how much time it takes for an artist to create something. Performance is only a goal insofar as to free artists from being limited by it.
> So I wonder if games studios are more just art studios with a bit of programming bolted on.
Not quite, but the ratio is very in favor of artists compared to 'the old days'. Programming is still a huge part of what we do. It's still a deeply technical field, but often "programming workflows" are lower priority than "artist workflows" in AAA engines because art time is more expensive than programmer time from the huge number of artists working on any one project compared to programmers.
Just go look at the credits for any recent AAA game. Look at how many artists positions there are compared to programmer positions and it becomes pretty clear.
Just to add to this, from a former colleague of mine who currently works as a graphics programmer at a UE5 studio: most graphics programmers are essentially tech support for artists nowadays. In an age where much of AAA is about making the biggest, most cinematic, most beautiful game, your artists and game content designers are the center of your production pipeline.
It used to be that the technology tended to drive the art. Nowadays the art drives the tech. We only need to look at all the advertised features of UE5 to see that. Nanite allows artists to spend less time tweaking LODs and optimizing meshes as well as flattening the cost of small triangle rendering. Lumen gives us realtime global illumination everywhere so artists don’t have to spend a million hours baking multiple light maps. Megalights lifts restrictions on the number of dynamic lights and shadows a lighting artist can place in the scene. The new Nanite foliage shown off in the Witcher 4 allows foliage artists to go ham with modeling their trees
That depends a lot on art direction and stylization. Highly stylized games scale up to high resolutions shockingly well even with less detailed, lower resolution models and textures. Breath of the Wild is one good example that looks great by modern standards at high resolutions, and there’s many others that manage to look a lot less dated than they are with similarly cartoony styles.
If “realistic” graphics are the objective though, then yes, better displays pose serious problems. Personally I think it’s probably better to avoid art styles that age like milk, though, or to go for a pseudo-realistic direction that is reasonably true to life while mixing in just enough stylization to scale well and not look dated at record speeds. Japanese studios seem pretty good at this.
Yeah, its flipped. Overall, it has meant studios are more and more dependent on third party software (and thus license fees), it led to game engine consolidation, and serious attrition when attempting to make something those game engines werent built for (non-pbr pipelines come to mind).
It's no wonder nothing comes out in a playable state.
> Arguably you could say the real drop in optimization happened in that PS1 -> PS3 era - everything went from hand optimized assembly code to running (generally) higher level languages and using abstrated graphics frameworks like DirectX and OpenGL. Just noone noticed because we had 1000x the compute to make up for it :)
Maybe / Kind of. Consoles in the PS1/N64 they were not running optimised assembly code. The 8bit and 16 bit machines were.
As for DirectX / OpenGL / Glide actually massively improved performance over running stuff on the CPU. You only ran stuff with software rendering if you had a really low performance GPU. Just look at Quake running in software vs Glide. It easily doubles on a Pentium based system.
> Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
The PC graphics card market got hit hard by those. Console markets were largely unaffected. There are many reasons why performance has stagnated. One of them I would argue is the use of the Unreal 4/5 engine. Every game that runs either of these engines has significant performance issues. Just look at Star wars: Jedi Survivor and the previous game Star wars Jedi: Fallen Order. Both games run poorly even on a well spec'd PC and even runs poorly on my PS5. Doesn't really matter though as Jedi Survivor sold well and I think Fallen Order also sold well.
The PS5 is basically a fixed PS4 (I've owned both). They've put a lot of effort into the PS5 into reducing loading times. Loading times on the PS4 were painful and were far longer than the PS3 (even games loading from Bluray). This was something Sony was focusing on. Every presentation about the PS5 talked about the new NVME drives and the external drive and the requirements for it.
The other reason is that the level of graphical fidelity achieved in the mid-2000s to early-2010s is good enough. A lot of reasons why some games age worse than others is due to the art style, rather than the graphical fidelity. Many of the high earning games don't have state of the art graphics e.g Fortnite prints cash and the graphics are pretty bad IMO.
Performance and Graphics just isn't the focus anymore. It doesn't really sell games like it used to.
You divided 230 by .03 wrong, which would be 10000-ish, but you underestimated the PS1 by a lot anyway. The CPU does 30 MIPS, but also the geometry engine does another 60 MIPS and the GPU fills 30 or 60 million pixels per second with multiple calculations each.
Not to mention that few developers were doing hand optimized assembly by the time of PSX. They were certainly hand optimizing models and the 3D pipeline (with some assembler tuning), but C and SDKs were well in use by that point.
Even Naughty Dog went with their own LISP engine for optimization versus ASM.
I don’t know about other developers at the time, but we had quite a lot of hand-written assembly code in the Crash games. The background and foreground renderers were all written in assembly by hand, as was the octree-based collision detection system. (Source: me; I wrote them.)
And this thread comes full circle: Mark Cerny actually significantly improved the performance of my original version of the Crash collision detection R3000 code. His work on this code finally made it fast enough, so it’s a really good thing he was around to help out. Getting the collision detection code correct and fast enough took over 9 months —- it was very difficult on the PS1 hardware, and ended up requiring use of the weird 2K static RAM scratchpad Sony including in place of the (removed) floating point unit.
GOOL was mainly used for creature control logic and other stuff that didn’t have to be optimized so much to be feasible. Being able to use a lisp dialect for a bunch of the code in the game saved us a ton of time. The modern analogue would be writing most of the code in Python but incorporating C extensions when necessary for performance.
Andy made GOAL (the successor lisp to GOOL) much more low-level, and it indeed allowed coding essentially at the assembly level (albeit with lispy syntax). But GOOL wasn’t like this.
I've never seen the Crash source code, so was making my statements based on second hand knowledge. So thanks for that clarification. I do think it's worth pointing out that Naughty Dog and Insomnia were two companies well known for making highly optimized software for the PSX; so probably not a standard most other companies matched.
Additionally, I have written my own PSX software as well as reviewed plenty of contemporaneous PSX software. While many have some bit of assembler, it's usually specifically around the graphics pipeline. About 90+% of all code is C. This is in line with interviews from developers at the time, as well.
The point wasn't that ASM wasn't used at all (in fact, I specifically acknowledged it in my original post), it was that the PSX was in an era passed the time when entire codebases were hand massaged/tuned assembler (e.g. "the 16-bit era" and before).
Naughty Dog's GOAL was PS2 specific and essentially chock full of what would be called intrinsics these days that let you interleave individual assembly instructions particularly for the crazy coprocessor setup of Emotion Engine.
My understanding is that the mental model of programming in PS2 era was originally still very assembly like outside of few places (like Naughty Dog) and that GTA3 on PS2 made possibly its biggest impact by showing it's not necessary.
If by "mental model" you mean "low-level" programming, sure. But you might as well conflate "religion" with "Southern Baptist protestantism" then. You're working with the same building blocks, but the programming style is drastically different.
The vast majority of PSX games were done completely in C, period. Some had small bits of asm here and there, but so do the occasional modern C/C++ apps.
To your last point, before there was GOAL there was GOOL (from the horse's mouth itself):
The quote I recall reading about long ago summarized the semi-official guidance as "write C like you write ASM".
Because outside of ports from PC, large amount of console game developers at the time were experienced a lot with with programming earlier consoles which had a lot more assembly level coding involved. GTA3 proved that "PC style" engine was good enough despite Emotion Engine design.
Didn't help that PS2 was very much oriented towards assembly coding at pretty low level, because getting the most of the hardware involved writing code for the multiple coprocessors to work somewhat in-sync - which at least for GOAL was done by implementing special support for writing the assembly code in line with rest of the code (because IIRC not all assembly involved was executed from the same instruction stream)
As for GOOL, it was the way more classic approach (used by ND on PS3 and newer consoles too) of core engine in C and "scripting" language on top to drive gameplay.
> The quote I recall reading about long ago summarized the semi-official guidance as "write C like you write ASM".
You could read that in pretty much any book about C, until the mid-00s. C was called "portable assembler" for the longest time because it went against the grain of ALGOL, Fortran, Pascal, etc by encouraging use of pointers and being direct to the machine. Thus why it only holds a viability in embedded development these days.
I've written C on the PSX, using contemporaneous SDKs and tooling, and I've reviewed source code from games at the time. There's nothing assembler about it, at least not more so than any systems development done then or today. If you don't believe me, there are plenty of retail PSX games that accidentally released their own source code that you can review yourself:
You're just arguing for the sake of arguing at this point and, I feel, being intellectually dishonest. Believe what you'd like to believe, or massage the facts how you like; I'm not interested in chasing goal (heh) posts.
>I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs
This feels very out of touch since AMD's latest GPU series is specialized in gaming only, to the point where they sell variants with 8GB, which is becoming a bit tight if you want to play modern games.
Yes but AMD also has an enterprise line of AI cards to protect. And regardless, if NVidia wasn't also making bank selling AI GPUs then we'd have seen them add more performance on gaming, which would have forced AMD to, etc.
Because optimized games aren’t completely extinct and there’s titles with similar levels of size, fidelity, and feature utilization with dramatically differing performance profiles.
Given the N64-PS1 era is filled with first party games that run at like 20 fps, I'm having a hard time saying things are worse now.
I am a bit uncomfortable with the performance/quality stuff that people have set up but I personally feel that the quality floor for perf is way higher than it used to be. Though there seem to be less people parking themselves at "60fps locked", which felt like a thing for a while
The current generation has a massive leap in storage speed but games need to be architected to stream that much data into RAM.
Cyberpunk is a good example of a game that straddled the in between, many of it's performance problems on the PS4 were due to constrained serialization speed.
Nanite and games like FF16 and Death Stranding 2 do a good job of drawing complex geometry and textures that wouldn't be possible on the previous generation
Nanite has a performance overhead for simple scenes but will render large, complex scenes with high-quality models much more efficiently, providing a faster and more stable framerate.
It’s also completely optional in Unreal 5. You use it if it’s better. Many published UE5 games don’t use it.
And loading times. I think people already forgot how long you had to wait on loading screens or how many faked loading (moving through a brush while the next area loads) there was on PS4
PS4 wasnt too terrible but jumping back to PS3... wow I completely forgot how memory starved that machine was. Working on it, we knew at the time but in retro spect it was just horrible.
Small RAM space with the hard CPU/GPU split (so no reallocation) feeding off a slow HDD which is being fed by an even slower Bluray disc, you are sitting around for a while.
If only we could just ship a 256GB NVMe SSD with every game and memory map the entire drive like you could with cartridges back then. Never have loading times again.
Also: I think it got less common on the N64, but games on SNES and NES and other old home consoles routinely accessed static game data, like graphic tiles, directly from the cartridge ROM. Without loading it into system RAM at all.
So there literally were no "loading" times for these assets. This might not even be realistically possible with NAND flash based SSDs, e.g. because of considerations like latency.
Though directly accessing ROM memory would also prevent things like texture block compression I believe.
Seems like an overgeneralization. I get it when FPS players want the best performance: players have FOMO of the best reaction time and the games are more built for fast action than contemplative scenery watching.
I wonder if players of single player action/adventure games make the same choice. Those games are played less (can be finished in 10-30 hours instead of endlessly) so the statistics might be skewed to favor performance mode.
> I wonder if players of single player action/adventure games make the same choice.
Anecdotally, I do. Because modern displays are horrible blurry messes at lower framerates. I don't care about my input latency, I care about my image not being a smear every time the camera viewport moves.
Yeah. Case in point: "Zelda: Ocarina of Time" was at the time and several years afterward often labeled as one of the best games ever made, despite the fact that it ran with 20 FPS on NTSC consoles and with 16.67 FPS on PAL machines.
I'm sure it would have been even more successful with modern 60 FPS, but that difference couldn't have been very large, because other 60 FPS games did exist back then as well, mostly without being nearly as popular.
This is the result of an industry wide problem where technology just is not moving forward as quickly as it used to move. Dennard scaling is dead. Moore’s law is also dead for SRAM and IO logic. It is barely clinging to life for compute logic, but the costs are skyrocketing as each die shrink happens. The result is that we are getting anemic improvements. This issue is visible in Nvidia’s graphics offerings too. They are not improving from generation to generation like they did in the past, despite Nvidia turning as many knobs as they could to higher values to keep the party going (e.g. power, die area, price, etcetera).
That talk predates the death of SRAM scaling. I will not bother wasting my time watching a video that is out of date.
That said, you should read that I did not say Moore’s Law was entirely dead. It is dead for SRAM and IO logic, but is still around for compute logic. However, pricing is shooting upward with each die shrink far faster than it did in the past.
And? Software is getting more sophisticated and capable too. First time I switched an iter to a par_iter in Rust and saw the loop spawn as many threads as I have logical cores felt like magic. Writing multi-threaded code used to be challenging.
> Now make that multi-threaded code exhaust a 32 core desktop system
Switching an iter to par_iter does this. So long as there are enough iterations to work through, it'll exhaust 1024 cores or more.
> all the time, not only at peak execution.
What are you doing that keeps a desktop or phone at 100% utilization? That kind of workload exists in datacenters, but end user devices are inherently bursty. Idle when not in use, race to idle while in use.
> As brownie points, keep the GPU busy as well... Even more points if the CPU happens to have a NPU or integrated FPGA
In a recent project I serve a WASM binary from an ESP32 via Wifi / HTTP, which makes use of the GPU via WebGL to draw the GUI, perform CSG, calculate toolpaths, and drip feed motion control commands back to the ESP. This took about 12k lines of Rust including the multithreaded CAD library I wrote for the project, only a couple hundred lines of which are gated behind the "parallel" feature flag. It was way less work than the inferior C++ version I wrote as part of the RepRap project 20 years ago. Hence my stance that software has become increasingly sophisticated.
The point being those are very niche cases that still don't keep the hardware busy as it should 24h around the clock.
Most consumer software even less, hence why anyone will hardly see a computer on the shopping mall with higher than 16 core count, and on average most shops will have something between 4 and 8.
Also a reason why systems with built-in FPGAs failed in the consumer market, specialised tools without consumer software to help sell them.
> don't keep the hardware busy as it should 24h around the clock.
If your workload demands 24/7 100% CPU usage, Epyc and Xeon are for you. There you can have multiple sockets with 256 or more cores each.
> Most consumer software even less
And yet, even in consumer gear which is built to a minimum spec budget, core counts, memory capacity, pcie lanes, bus bandwidth, IPC, cache sizes, GPU shaders, NPU TOPS, all increasing year over year.
> systems with built-in FPGAs failed in the consumer market
Talk about niche. I've never met an end user with a use for an FPGA or the willingness to learn what one is. I'd say that has more to do with it. Write a killer app that regular folks want to use that requires one, and they'll become popular. Rooting for you.
AFAIK, this generation has been widely slammed as a failure due to lack of new blockbuster games. Most things that came out were either for PS4, or remasters of said games.
There have been a few decent sized games, but nothing at grand scale I can think of, until GTA6 next year.
The big jump between 4 and 5 was the NVME SSD and hardware decompression IMO. Load times in a regular PS5 are non existent compared to a PS4, that's the big generational jump.
For graphics, I agree it looks like diminishing returns.
"Bloated" might be the wrong word to describe it, but there's some reason to believe that the dominance of Unreal is holding performance back. I've seen several discussions about Unreal's default rendering pipeline being optimized for dynamic realtime photorealistic-ish lighting with complex moving scenes, since that's much of what Epic needs for Fortnite. But most games are not that and don't make remotely effective use of the compute available to them because Unreal hasn't been designed around those goals.
TAA (temporal anti-aliasing) is an example of the kind of postprocessing effect that gamedevs are relying on to recover performance lost in unoptimized rendering pipelines, at the cost of introducing ghosting and loss of visual fidelity.
TAA isn't a crutch being used to hold up poor performance, it's an optimization to give games anti-aliasing that doesn't suck.
Your other options for AA are
* Supersampling. Rendering the game at a higher resolution than the display and downscaling it. This is incredibly expensive.
* MSAA. This samples ~~vertices~~surfaces more than once per pixel, smoothing over jaggies. This worked really well back before we started covering every surface with pixel shaders. Nowadays it just makes pushing triangles more expensive with very little visual benefit, because the pixel shaders are still done at 1x scale and thus still aliased.
* Post-process AA (FXAA,SMAA, etc). These are a post-process shader applied to the whole screen after the scene has been fully rendered. They often just use a cheap edge detection algorithm and try to blur them. I've never seen one that was actually effective at producing a clean image, as they rarely catch all the edges and do almost nothing to alleviate shimmering.
I've seen a lot of "tech" YouTubers try to claim TAA is a product of lazy developers, but not one of them has been able to demonstrate a viable alternative antialiasing solution that solves the same problem set with the same or better performance. Meanwhile TAA and its various derivatives like DLAA have only gotten better in the last 5 years, alleviating many of the problems TAA became notorious for in the latter '10s.
Yeah. Only problem is that overly aggressive TAA implementations blur the whole frame during camera rotation. The thing that is even better than standard TAA is a combination of TAA and temporal upscaling, called TSR in Unreal. Still better is the same system but performed by an ML model, e.g. DLSS. Though this requires special inference hardware inside the GPU.
In the past, MSAA worked reasonably well, but it was relatively expensive, doesn't apply to all forms of high frequency aliasing, and it doesn't work anymore with the modern rendering paradigm anyway.
ThreatInteractive is an anti-TAA developer/YouTuber. They make a compelling argument against TAA and present an alternative they are working on for Unreal.
Erm your description of MSAA isn't quite correct, it has nothing to do with vertices and doesn't increase vertex processing cost..
It's more similar to supersampling, but without the higher pixel shader cost (the pixel shader still only runs once per "display pixel", not once per "sample" like in supersampling).
A pixel shader's output is written to multiple (typically 2, 4 or 8) samples, with a coverage mask deciding which samples are written (this coverage mask is all 1s inside a triangle and a combo of 1s and 0s along triangle edges). After rendering to the MSAA render target is complete, an MSAA resolve operation is performed which merges samples into pixels (and this gives you the smoothed triangle edges).
> solves the same problem set with the same or better performance
The games industry has spent the last decade adopting techniques that misleadingly inflate the simple, easily-quantified metrics of FPS and resolution, by sacrificing quality in ways that are harder to quantify. Until you have good metrics for quantifying the motion artifacts and blurring introduced by post-processing AA, upscaling, and temporal AA or frame generation, it's dishonest to claim that those techniques solve the same problem with better performance. They're giving you a worse image, and pointing to the FPS numbers as evidence that they're adequate is focusing on entirely the wrong side of the problem.
That's not to say those techniques aren't sometimes the best available tradeoff, but it's wrong to straight-up ignore the downsides because they're hard to measure.
This is a very one-sided perspective on things. Any precomputed solution to lighting comes with enormous drawbacks across the board. The game needs to ship the precomputed data when storage is usually already tight. The iteration cycle for artists and level designers suchs when lighting is precomputed - they almost never see accurate graphics for their work while they are iterating because rebaking takes time away from their work. Game design become restricted to those limitations, too. Can't even think of having the player randomly rearranging big things in a level (e.g. building or tearing down a house) because the engine can't do it. Who knows what clever game mechanics are never thought of because of these types of limitations?
Fully dynamic interactive environments are liberating. Pursuing them in is the right thing to do.
In principle, Epic's priorities for Unreal should be aligned to a lot of what we've seen in the PS3/4/5 generation as far as over-the-shoulder 3rd person action adventure games.
I mean, look at Uncharted, Tomb Raider, Spider-Man, God of War, TLOU, HZD, Ghost of Tsushima, Control, Assassins Creed, Jedi Fallen Order / Survivor. Many of those games were not made in Unreal, but they're all stylistically well suited to what Unreal is doing.
I had the pleasure to spend some time with Mark Cerny many years ago. He was honestly one of the most impressive people I have ever met. Down to earth and so, so smart. I also think it speaks volumes for Sony as a company that an American born video game developer (engineer not mba) has such an influential position. They are not insular and respect the craft.
A few days ago there was a similar message from Xbox, saying that AMD will power it's future hardware project, talking about a strategic alliance and so on.
So, Mark Cerny is contributing on the next Xbox? At the end, today all consoles are basically PCs with different frontends and storefronts (and that is also opening up, starting with xbox but probably PS will follow eventually)
If the Playstation contributions are good enough, maybe RDNA4 -> RDNA5 will be just as good as RDNA3 -> RDNA4. As long as they get the pricing right, anyway.
Excited to see how the software support for UDNA1 works out. Very hopeful we'll see some real competition to Nvidia soon in the datacenter. Unfortunately I think the risk is quite high: if AMD burns developers again with poor drivers and poor support, it's hard to see how they'll be able to shake the current stigma.
Take this with a pinch of salt, but the most recent ROCm release installed out of the box on my WSL2 machine and worked first time with llama.cpp. I even compiled llama.cpp from source with 0 issues. That has never happened ever in my 5+ years of having AMD GPUs. Every other time I've tried this it's either failed and required arcane workarounds, or just not worked entirely (including running on 'real' Linux).
I feel like finally they are turning the corner on software and drivers.
We've known this for a while, it's an extension of the upscaling and frame generation AMD already worked on in conjunction with Sony for FSR 3 and to a much greater extent FSR 4. Previous articles also have highlighted their shared focus on BVH optimizations
Sony's low level APIs for PS4 and PS5 (it doesn't use GNM) is almost a direct mapping to the hardware with very little abstraction compared to Vulkan/DX12. Vulkan is still very high level compared to what's going on inside the driver. There's no point paying the cost of Vulkan's abstractions when half the point of a game console is to have a fixed hardware target, hence GNM.
It's basically same AMD GPU and you can compile SPIR-V into its machine code optimally without any kind of made up "I'm closer than close to the hardware" secrets. Compiler optimizations are done all the time (see Valve's aco and AMD's llvm based compiler).
Vulkan is more than adequate to handle things for the hardware on the API side so I don't buy the claim that they are somehow "closer than close" to be better.
Because if they write their own they get to own the political/bureaucratic portion of the problem. For better or worse, they don't have to deal with the Kronos Group. They get to optimize their APIs directly against their research with AMD.
Why would Vulkan, as opposed to a custom solution designed to target that hardware and games specifically, be a better solution?
If you’re making a PS game you’re already doing tons of bespoke PS stuff. If you don’t want to deal with it there are plenty of pieces of middleware out there to help.
Honestly these “where’s Vulkan” posts on every bit of 3D capable hardware feel like a stupid meme at this point as opposed to a rational question.
Maybe they should just ship DX12. That’s multi-platform too.
Why would Vulkan, as opposed to a custom solution designed to target that hardware and games specifically, be a better solution?
Vulkan came to be as a low-overhead rendering API targeted at 3D video games made originally, as Mantle, by both AMD and DICE.
If anything, I'd see any console or game people involved with this as a bad signal, considering what a travesty Vulkan came to be and that one originated from such a group.
Vulkan is a mess as bad as OpenGL, to the point Khronos has been forced to public admit it didn't went out as planned, is yet again an extension spaghetti, and they are now trying to clean the house with the Vulkan Roadmap introduced at Vulkanised 2025.
That is how good Khronos "design by commitee APIs" end up being.
Vulkan's extension story is a mess. Not surprising from the people that brought you OpenGL.
However please don't undersell what they got right. Because what they got right, they got _very_ right.
Barriers. Vulkan got barriers so absolutely right that every competing API has now adopted a clone of Vulkan's barrier API. The API is cumbersome but brutally unambiguous compared to DirectX's resource states or Metal's hazard tracking. DirectX has a rats nest of special cases in the resource state API because of how inexpressive it is, and just straight up forgot to consider COPY->COPY barriers.
We also have SPIR-V. Again, D3D12 plans to dump DXIL and adopt SPIR-V.
The fundamentals Vulkan got very right IMO. It's a shame it gets shackled to the extension mess.
SPIR-V only exists to reaction of Khronos losing to CUDA and PTX, it is the evolution of SPIR, created when they realised industry didn't really want to write OpenCL in a textual form of C99 dialect, rather bytecode formats targeted by multiple compiler backends, including Fortran ignored by Khronos.
Microsoft adopting SPIR-V as DXIL replacement is most likely a matter of convenience, the format got messy to maintain, is tied to an old fork of LLVM, and HLSL has gotten the industy weight, even favoured over GLSL for Vulkan (which Khronos acknowledge at Vulkanised 2024 not doing any work at all, zero, nada), so why redo DXIL from scratch, when they could capitalize on existing work to target SPIR-V from HLSL.
DXIL has other issues beyond being an orphaned fork of LLVM-IR. LLVM is ill-equipped to represents shaders, and there have been real bugs in the wild because of how not-a-shader-language LLVM-IR is. See this [0] post under the 'The classic subgroup breakage – maximal convergence gone awry' section. Read the whole series really if you haven't, it's a fantastic dive into how awful this stuff is.
DXIL is also a useless intermediate representation because parsing it is so bloody difficult nobody could actually do anything with it. DXIL was, and still is, functionally opaque bytes. You can't introspect it, you can't modify it or do anything of use with it. SPIR-V is dead-simple to parse and has an array of tools built around it because it's so easy to work with.
I don't really see how the OpenCL history is relevant to Vulkan either. Khronos losing the OpenCL game to Nvidia, certainly no thanks to Nvidia's sabotage either, doesn't change that SPIR-V is a much more successful format.
> Khronos losing the OpenCL game to Nvidia, certainly no thanks to Nvidia's sabotage either, doesn't change that SPIR-V is a much more successful format.
could you elaborate more on this? sounds interesting
OpenCL history is more than relevant, without SPIR, there would not exist SPIR-V.
People love to blame NVidia for AMD and Intel failures pushing OpenCL into the industry, and Google completly ignoring it for mobile devices, pushing their own RenderScript dialect instead, even if Apple would have behave differently regarding their platforms, the other 80% market has completly ignored it.
This comment is just a bunch of bunk, but regardless, Vulkan is the only thing that's a collaborative effort, whether it's a mess or not. NIH proponents have nothing to offer as an alternative.
And most of the standard we have now starts with something similar to NIH. Vulkan itself is an offshoot of mantel from AMD.
There are valid reason to have a custom api. Especially in domain like game console with hardware with long release cycle, tight performance requirement and legacy (ps4) code to support.
So you're right, though I would never have guessed— in the PS5 hype cycle he gave that deep dive architecture presentation that for all the world looked like he was a Sony spokesperson.
Why not link to the original article here:
https://www.tomsguide.com/gaming/playstation/sonys-mark-cern...
This was published after TFA, how is it the original?
Wdym?
2 vs 1 days
Nevermind, I was confused.
There isn't an RDNA5 on the roadmap, though. It's been confirmed 4 is the last (and was really meant to be 3.5, but grew into what is assumed to be the PS5/XSX mid-gen refresh architecture).
Next is UDNA1, a converged architecture with it's older sibling, CDNA (formerly GCN).
Like, the article actually states this, but runs an RDNA 5 headline anyways.
Maybe read the article before commenting on it, it's not that long.
"Big chunks of RDNA 5, or whatever AMD ends up calling it, are coming out of engineering I am doing on the project"
AMD does do semi-custom work.
Whats to stop sony being like we dont want UDNA 1, we want a iteration of RDNA 4.
For all we know, it IS RDNA 5... it just wont be available to the public.
And their half step/semi-custom work can find their way back to APUs. RDNA 3.5 (the version marketed as such) is in the Zen 5 APUs with Mobile oriented improvements. It wouldn’t surprise me if a future APU gets RDNA 5. GCN had this sort of APU/Console relationship as well.
Also steamdeck before the OLED version and magic leap 2 shared a custom chip, with some vision processing parts fused off for steamdeck.
It's just a name. I'm sure this is all pretty iterative work.
UDNA isn't a name but instead a big shift in strategy.
CDNA was for HPC / Supercomputers and Data center. GCN always was a better architecture than RDNA for that.
RDNA itself was trying to be more NVidia like. Fewer FLOPs but better latency.
Someone is getting the axe. Only one of these architectures will win out in the long run, and the teams will also converge allowing AMD to consolidate engineers to improving the same architecture.
We won't know what the consolidated team will release yet. But it's a big organizational shift that surely will affect AMDs architectural decisions.
My understanding was that CDNA and RDNA shared much if not most of their underlying architecture, and that the fundamental differences had more to do with CDNA supporting a greater variety of numeric representations to aid in scientific computing. Whereas RDNA really only needed fp32 for games.
That's not entirely wrong.
https://gpuopen.com/download/RDNA_Architecture_public.pdf
I've been showing this one to people for a few years as a good introduction on how RDNA diverged from GCN->CDNA.
The main thing they did was change where wavefront steps (essentially, quasi-VLIW packets) execute: instead of being at the head of the pipeline (which owns 4x SIMD16 ALUs = 64 items) and requires executing 64 threads concurrently (thus, 64x registers/LDS/etc space), it issues non-blocking segments of the packet into per-ALU sub-pipelines, requiring far fewer concurrent threads to maintain peak performance (and, in many cases, far less concurrent registers used for intermediates that don't leave the packet).
GCN is optimized for low instruction parallelism but high parallelism workloads. Nvidia since the dawn of their current architecture family tree has been optimized for high instruction parallelism but not simple highly parallel workloads. RDNA is optimized to handle both GCN-optimal and NVidia-optimal cases.
RDNA, since this document has been written, also has been removing all the roadblocks to improve performance on this fundamental difference. RDNA4, the one that just came out, increased the packet processing queue to be able to schedule more packets in parallel and more segments of the packets into their per-ALU slots, is probably the most influential change: in software that performed bad on all GPUs (GCN, previous RDNA, anything Nvidia), a 9070XT can perform like a 7900XTX with 2/3rds the watts and 2/3rds the dollars.
While CDNA has been blow for blow against Nvidia's offerings since it's name change, RDNA has eradicated the gap in gaming performance. Nvidia functionally doesn't have a desktop product below a 5090 now, and early series 60 rumors aren't spicy enough to make me think Nvidia has an answer in the future, either.
Who told ya that??
CDNA is 64 wide per work item. And CDNA1 I believe was even 16 lanes executed over 4 clock ticks repeatedly (ie: minimum latency of all operations, even add or xor, was 4 clock ticks). It looks like CDNA3 might not do that anymore but that's still a lot of differences...
RDNA actually executes 32-at-a-time and per clock tick. It's a grossly different architecture.
That doesn't even get to Infinity Cache, 64-bit support, AI instructions, Raytracing, or any of the other differences....
CDNA is based on the older gcn arch so they share the same as pre RDNA ones and RDNA ones.
PS5 was almost twice as fast as the PS4 pro, yet we did not see the generational leap we saw with the previous major releases.
It seems that we are the stage where incremental improvements in graphics will require exponentially more computing capability.
Or the game engines have become super bloated.
Edit: I stand corrected in previous cycles we had orders of magnitude improvement in FLOPS.
A reason was backwards compatibility, studios were already putting lots of money into PS4 and XBox One, thus PS5 and XBox X|S (two additional SKUs), were already too much.
Don't forget one reason that studios tend to favour consoles has been regular hardware, and that is no longer the case.
When middleware starts to be the option, it is relatively hard to have game features that are hardware specific.
Games budgets ballooned and it was not longer financially viable for single platform games.
Less effort going into optimization also plays a factor. On average games are a lot less optimized than they used to be. The expectation seems to be that hardware advances will fix deficiencies in performance.
This doesn’t affect me too much since my backlog is long and by the time I play games, they’re old enough that current hardware trivializes them, but it’s disappointing nonetheless. It almost makes me wish for a good decade or so of performance stagnation to curb this behavior. Graphical fidelity is well past the point of diminishing returns at this point anyway.
We have had a decade of performance stagnation.
Compare PS1 with PS3 (just over 10 years apart).
PS1: 0.03 GFLOPS (approx given it didn't really do FLOPS per se) PS3: 230 GFLOPS
Nearly 1000x faster.
Now compare PS4 with PS5 pro (also just over 10 years apart):
PS4: ~2TFLOPS PS5 Pro: ~33.5TFLOPS
Bit over 10x faster. So the speed of improvement has fallen dramatically.
Arguably you could say the real drop in optimization happened in that PS1 -> PS3 era - everything went from hand optimized assembly code to running (generally) higher level languages and using abstrated graphics frameworks like DirectX and OpenGL. Just noone noticed because we had 1000x the compute to make up for it :)
Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
It is not just GPU performance, it is that visually things are already very refined. A ten times leap in performance doesn't really show as ten times the visual spectical like it used to.
Like all this path tracing/ray tracing stuff, yes it is very cool and can add to a scene but most people can barely tell it is there unless you show it side by side. And that takes a lot of compute to do.
We are polishing an already very polished rock.
Yes but in the PS1 days we were doing a 1000x compute performance a decade.
I agree that 10x doesn't move much, but that's sort of my point - what could be done with 1000x?
Yeah there’s been a drop off for sure. Clearly it hasn’t been steep enough for game studios to not lean on anyway, though.
One potential forcing factor may be the rise of iGPUs, which have become powerful enough to play many titles well while remaining dramatically more affordable than their discrete counterparts (and sometimes not carrying crippling VRAM limits to boot), as well as the growing sector of PC handhelds like the Steam Deck. It’s not difficult to imagine that iGPUs will come to dominate the PC gaming sphere, and if that happens it’ll be financial suicide to not make sure your game plays reasonably well on such hardware.
I get the perhaps mistaken impression the biggest problem games developers have is making & managing absolutely enormous amounts of art assets at high resolution (textures, models, etc). Each time you increase resolution from 576p, to 720p to 1080p and now 4k+ you need a huge step up in visual fidelity of all your assets, otherwise it looks poor.
And given most of these assets are human made (well, until very recently) this requires more and more artists. So I wonder if games studios are more just art studios with a bit of programming bolted on, vs before with lower res graphics where you maybe had one artist for 10 programmers, now it is more flipped the other way. I feel that at some point over the past ~decade we hit a "organisational" wall with this and very very few studios can successfully manage teams of hundreds (thousands?) of artists effectively?
This hits the nail pretty close to the head. I work on an in-house AAA engine used by a number of different games. It's very expensive to produce art assets at the quality expected now.
Many AAA engine's number one focus isn't "performance at all costs", it's "how do we most efficiently let artists build their vision". And efficiency isn't runtime performance, efficiency is how much time it takes for an artist to create something. Performance is only a goal insofar as to free artists from being limited by it.
> So I wonder if games studios are more just art studios with a bit of programming bolted on.
Not quite, but the ratio is very in favor of artists compared to 'the old days'. Programming is still a huge part of what we do. It's still a deeply technical field, but often "programming workflows" are lower priority than "artist workflows" in AAA engines because art time is more expensive than programmer time from the huge number of artists working on any one project compared to programmers.
Just go look at the credits for any recent AAA game. Look at how many artists positions there are compared to programmer positions and it becomes pretty clear.
Just to add to this, from a former colleague of mine who currently works as a graphics programmer at a UE5 studio: most graphics programmers are essentially tech support for artists nowadays. In an age where much of AAA is about making the biggest, most cinematic, most beautiful game, your artists and game content designers are the center of your production pipeline.
It used to be that the technology tended to drive the art. Nowadays the art drives the tech. We only need to look at all the advertised features of UE5 to see that. Nanite allows artists to spend less time tweaking LODs and optimizing meshes as well as flattening the cost of small triangle rendering. Lumen gives us realtime global illumination everywhere so artists don’t have to spend a million hours baking multiple light maps. Megalights lifts restrictions on the number of dynamic lights and shadows a lighting artist can place in the scene. The new Nanite foliage shown off in the Witcher 4 allows foliage artists to go ham with modeling their trees
That depends a lot on art direction and stylization. Highly stylized games scale up to high resolutions shockingly well even with less detailed, lower resolution models and textures. Breath of the Wild is one good example that looks great by modern standards at high resolutions, and there’s many others that manage to look a lot less dated than they are with similarly cartoony styles.
If “realistic” graphics are the objective though, then yes, better displays pose serious problems. Personally I think it’s probably better to avoid art styles that age like milk, though, or to go for a pseudo-realistic direction that is reasonably true to life while mixing in just enough stylization to scale well and not look dated at record speeds. Japanese studios seem pretty good at this.
Yeah, its flipped. Overall, it has meant studios are more and more dependent on third party software (and thus license fees), it led to game engine consolidation, and serious attrition when attempting to make something those game engines werent built for (non-pbr pipelines come to mind).
It's no wonder nothing comes out in a playable state.
> Arguably you could say the real drop in optimization happened in that PS1 -> PS3 era - everything went from hand optimized assembly code to running (generally) higher level languages and using abstrated graphics frameworks like DirectX and OpenGL. Just noone noticed because we had 1000x the compute to make up for it :)
Maybe / Kind of. Consoles in the PS1/N64 they were not running optimised assembly code. The 8bit and 16 bit machines were.
As for DirectX / OpenGL / Glide actually massively improved performance over running stuff on the CPU. You only ran stuff with software rendering if you had a really low performance GPU. Just look at Quake running in software vs Glide. It easily doubles on a Pentium based system.
> Consoles/games got hit hard by first crypto and now AI needing GPUs. I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs, but when you were making boatloads of cash off crypto miners and then AI I suspect the rate of progress fell dramatically for gaming at least (most of the the innovation I suspect went more into high VRAM/memory controllers and datacentre scale interconnects).
The PC graphics card market got hit hard by those. Console markets were largely unaffected. There are many reasons why performance has stagnated. One of them I would argue is the use of the Unreal 4/5 engine. Every game that runs either of these engines has significant performance issues. Just look at Star wars: Jedi Survivor and the previous game Star wars Jedi: Fallen Order. Both games run poorly even on a well spec'd PC and even runs poorly on my PS5. Doesn't really matter though as Jedi Survivor sold well and I think Fallen Order also sold well.
The PS5 is basically a fixed PS4 (I've owned both). They've put a lot of effort into the PS5 into reducing loading times. Loading times on the PS4 were painful and were far longer than the PS3 (even games loading from Bluray). This was something Sony was focusing on. Every presentation about the PS5 talked about the new NVME drives and the external drive and the requirements for it.
The other reason is that the level of graphical fidelity achieved in the mid-2000s to early-2010s is good enough. A lot of reasons why some games age worse than others is due to the art style, rather than the graphical fidelity. Many of the high earning games don't have state of the art graphics e.g Fortnite prints cash and the graphics are pretty bad IMO.
Performance and Graphics just isn't the focus anymore. It doesn't really sell games like it used to.
You divided 230 by .03 wrong, which would be 10000-ish, but you underestimated the PS1 by a lot anyway. The CPU does 30 MIPS, but also the geometry engine does another 60 MIPS and the GPU fills 30 or 60 million pixels per second with multiple calculations each.
Not to mention that few developers were doing hand optimized assembly by the time of PSX. They were certainly hand optimizing models and the 3D pipeline (with some assembler tuning), but C and SDKs were well in use by that point.
Even Naughty Dog went with their own LISP engine for optimization versus ASM.
I don’t know about other developers at the time, but we had quite a lot of hand-written assembly code in the Crash games. The background and foreground renderers were all written in assembly by hand, as was the octree-based collision detection system. (Source: me; I wrote them.)
And this thread comes full circle: Mark Cerny actually significantly improved the performance of my original version of the Crash collision detection R3000 code. His work on this code finally made it fast enough, so it’s a really good thing he was around to help out. Getting the collision detection code correct and fast enough took over 9 months —- it was very difficult on the PS1 hardware, and ended up requiring use of the weird 2K static RAM scratchpad Sony including in place of the (removed) floating point unit.
GOOL was mainly used for creature control logic and other stuff that didn’t have to be optimized so much to be feasible. Being able to use a lisp dialect for a bunch of the code in the game saved us a ton of time. The modern analogue would be writing most of the code in Python but incorporating C extensions when necessary for performance.
Andy made GOAL (the successor lisp to GOOL) much more low-level, and it indeed allowed coding essentially at the assembly level (albeit with lispy syntax). But GOOL wasn’t like this.
I've never seen the Crash source code, so was making my statements based on second hand knowledge. So thanks for that clarification. I do think it's worth pointing out that Naughty Dog and Insomnia were two companies well known for making highly optimized software for the PSX; so probably not a standard most other companies matched.
Additionally, I have written my own PSX software as well as reviewed plenty of contemporaneous PSX software. While many have some bit of assembler, it's usually specifically around the graphics pipeline. About 90+% of all code is C. This is in line with interviews from developers at the time, as well.
The point wasn't that ASM wasn't used at all (in fact, I specifically acknowledged it in my original post), it was that the PSX was in an era passed the time when entire codebases were hand massaged/tuned assembler (e.g. "the 16-bit era" and before).
Naughty Dog's GOAL was PS2 specific and essentially chock full of what would be called intrinsics these days that let you interleave individual assembly instructions particularly for the crazy coprocessor setup of Emotion Engine.
My understanding is that the mental model of programming in PS2 era was originally still very assembly like outside of few places (like Naughty Dog) and that GTA3 on PS2 made possibly its biggest impact by showing it's not necessary.
If by "mental model" you mean "low-level" programming, sure. But you might as well conflate "religion" with "Southern Baptist protestantism" then. You're working with the same building blocks, but the programming style is drastically different.
The vast majority of PSX games were done completely in C, period. Some had small bits of asm here and there, but so do the occasional modern C/C++ apps.
To your last point, before there was GOAL there was GOOL (from the horse's mouth itself):
https://all-things-andy-gavin.com/tag/lisp-programming/
And it was used in all of Naughty Dog's PSX library.
The quote I recall reading about long ago summarized the semi-official guidance as "write C like you write ASM".
Because outside of ports from PC, large amount of console game developers at the time were experienced a lot with with programming earlier consoles which had a lot more assembly level coding involved. GTA3 proved that "PC style" engine was good enough despite Emotion Engine design.
Didn't help that PS2 was very much oriented towards assembly coding at pretty low level, because getting the most of the hardware involved writing code for the multiple coprocessors to work somewhat in-sync - which at least for GOAL was done by implementing special support for writing the assembly code in line with rest of the code (because IIRC not all assembly involved was executed from the same instruction stream)
As for GOOL, it was the way more classic approach (used by ND on PS3 and newer consoles too) of core engine in C and "scripting" language on top to drive gameplay.
> The quote I recall reading about long ago summarized the semi-official guidance as "write C like you write ASM".
You could read that in pretty much any book about C, until the mid-00s. C was called "portable assembler" for the longest time because it went against the grain of ALGOL, Fortran, Pascal, etc by encouraging use of pointers and being direct to the machine. Thus why it only holds a viability in embedded development these days.
I've written C on the PSX, using contemporaneous SDKs and tooling, and I've reviewed source code from games at the time. There's nothing assembler about it, at least not more so than any systems development done then or today. If you don't believe me, there are plenty of retail PSX games that accidentally released their own source code that you can review yourself:
https://www.retroreversing.com/source-code/retail-console-so...
You're just arguing for the sake of arguing at this point and, I feel, being intellectually dishonest. Believe what you'd like to believe, or massage the facts how you like; I'm not interested in chasing goal (heh) posts.
>I suspect if it wasn't for that we'd have vastly cheaper and vastly faster gaming GPUs
This feels very out of touch since AMD's latest GPU series is specialized in gaming only, to the point where they sell variants with 8GB, which is becoming a bit tight if you want to play modern games.
Yes but AMD also has an enterprise line of AI cards to protect. And regardless, if NVidia wasn't also making bank selling AI GPUs then we'd have seen them add more performance on gaming, which would have forced AMD to, etc.
By what metric can you say this with any confidence when game scope and fidelity has ballooned?
Because optimized games aren’t completely extinct and there’s titles with similar levels of size, fidelity, and feature utilization with dramatically differing performance profiles.
Given the N64-PS1 era is filled with first party games that run at like 20 fps, I'm having a hard time saying things are worse now.
I am a bit uncomfortable with the performance/quality stuff that people have set up but I personally feel that the quality floor for perf is way higher than it used to be. Though there seem to be less people parking themselves at "60fps locked", which felt like a thing for a while
The current generation has a massive leap in storage speed but games need to be architected to stream that much data into RAM.
Cyberpunk is a good example of a game that straddled the in between, many of it's performance problems on the PS4 were due to constrained serialization speed.
Nanite and games like FF16 and Death Stranding 2 do a good job of drawing complex geometry and textures that wouldn't be possible on the previous generation
Nanite is actively hurting performance
Nanite has a performance overhead for simple scenes but will render large, complex scenes with high-quality models much more efficiently, providing a faster and more stable framerate.
It’s also completely optional in Unreal 5. You use it if it’s better. Many published UE5 games don’t use it.
Well yeah, features that push the graphics card typically incur performance hits.
A lot of the difference went into FPS rather than improved graphics.
And loading times. I think people already forgot how long you had to wait on loading screens or how many faked loading (moving through a brush while the next area loads) there was on PS4
PS4 wasnt too terrible but jumping back to PS3... wow I completely forgot how memory starved that machine was. Working on it, we knew at the time but in retro spect it was just horrible.
Small RAM space with the hard CPU/GPU split (so no reallocation) feeding off a slow HDD which is being fed by an even slower Bluray disc, you are sitting around for a while.
PS3 loading times IME were better than the PS4.
Bloodborne when it came out was around 1 minute between deaths.
Did you forget that on the N64, load times were near instantaneous?
The N64 was cartridge based.
If only we could just ship a 256GB NVMe SSD with every game and memory map the entire drive like you could with cartridges back then. Never have loading times again.
Also: I think it got less common on the N64, but games on SNES and NES and other old home consoles routinely accessed static game data, like graphic tiles, directly from the cartridge ROM. Without loading it into system RAM at all.
So there literally were no "loading" times for these assets. This might not even be realistically possible with NAND flash based SSDs, e.g. because of considerations like latency.
Though directly accessing ROM memory would also prevent things like texture block compression I believe.
This is correct. Also, it speaks to what players actually value.
I have played through CP2077 with 40, 30 and 25 fps. A child doesn't care if Zelda runs with low FPS.
The only thing I value is a consistent stream of frames on a console.
When given a choice, most users prefer performance over higher fidelity
I would like to see the stats for that.
> "When asked to decide on a mode, players typically choose performance mode about three-quarters of the time,
From PS5 Pro reveal https://youtu.be/X24BzyzQQ-8?t=172
Seems like an overgeneralization. I get it when FPS players want the best performance: players have FOMO of the best reaction time and the games are more built for fast action than contemplative scenery watching.
I wonder if players of single player action/adventure games make the same choice. Those games are played less (can be finished in 10-30 hours instead of endlessly) so the statistics might be skewed to favor performance mode.
> I wonder if players of single player action/adventure games make the same choice.
Anecdotally, I do. Because modern displays are horrible blurry messes at lower framerates. I don't care about my input latency, I care about my image not being a smear every time the camera viewport moves.
Yeah. Case in point: "Zelda: Ocarina of Time" was at the time and several years afterward often labeled as one of the best games ever made, despite the fact that it ran with 20 FPS on NTSC consoles and with 16.67 FPS on PAL machines.
I'm sure it would have been even more successful with modern 60 FPS, but that difference couldn't have been very large, because other 60 FPS games did exist back then as well, mostly without being nearly as popular.
Children eat dirt. I'm not sure "children don't care" is a good benchmark.
Also FPS just requires throwing more compute at it.
Excessively high detail models require extra artist time too.
Yes PS5 can output 120hz on hdmi. A perfect linear output to direct your more compute at.
This is the result of an industry wide problem where technology just is not moving forward as quickly as it used to move. Dennard scaling is dead. Moore’s law is also dead for SRAM and IO logic. It is barely clinging to life for compute logic, but the costs are skyrocketing as each die shrink happens. The result is that we are getting anemic improvements. This issue is visible in Nvidia’s graphics offerings too. They are not improving from generation to generation like they did in the past, despite Nvidia turning as many knobs as they could to higher values to keep the party going (e.g. power, die area, price, etcetera).
Jim Keller disagrees: https://www.youtube.com/watch?v=oIG9ztQw2Gc
That talk predates the death of SRAM scaling. I will not bother wasting my time watching a video that is out of date.
That said, you should read that I did not say Moore’s Law was entirely dead. It is dead for SRAM and IO logic, but is still around for compute logic. However, pricing is shooting upward with each die shrink far faster than it did in the past.
Hardware improvements only matter to the extent software is actually able to make use of them.
And? Software is getting more sophisticated and capable too. First time I switched an iter to a par_iter in Rust and saw the loop spawn as many threads as I have logical cores felt like magic. Writing multi-threaded code used to be challenging.
Now make that multi-threaded code exhaust a 32 core desktop system, all the time, not only at peak execution.
As brownie points, keep the GPU busy as well, beyond twirling its fingers while keeping the GUI desktop going.
Even more points if the CPU happens to have a NPU or integrated FPGA, and you manage to also keep them going alongside those 32 cores, and GPU.
> Now make that multi-threaded code exhaust a 32 core desktop system
Switching an iter to par_iter does this. So long as there are enough iterations to work through, it'll exhaust 1024 cores or more.
> all the time, not only at peak execution.
What are you doing that keeps a desktop or phone at 100% utilization? That kind of workload exists in datacenters, but end user devices are inherently bursty. Idle when not in use, race to idle while in use.
> As brownie points, keep the GPU busy as well... Even more points if the CPU happens to have a NPU or integrated FPGA
In a recent project I serve a WASM binary from an ESP32 via Wifi / HTTP, which makes use of the GPU via WebGL to draw the GUI, perform CSG, calculate toolpaths, and drip feed motion control commands back to the ESP. This took about 12k lines of Rust including the multithreaded CAD library I wrote for the project, only a couple hundred lines of which are gated behind the "parallel" feature flag. It was way less work than the inferior C++ version I wrote as part of the RepRap project 20 years ago. Hence my stance that software has become increasingly sophisticated.
https://github.com/timschmidt/alumina-firmware
https://github.com/timschmidt/alumina-ui
https://github.com/timschmidt/csgrs
What's your point?
The point being those are very niche cases that still don't keep the hardware busy as it should 24h around the clock.
Most consumer software even less, hence why anyone will hardly see a computer on the shopping mall with higher than 16 core count, and on average most shops will have something between 4 and 8.
Also a reason why systems with built-in FPGAs failed in the consumer market, specialised tools without consumer software to help sell them.
> don't keep the hardware busy as it should 24h around the clock.
If your workload demands 24/7 100% CPU usage, Epyc and Xeon are for you. There you can have multiple sockets with 256 or more cores each.
> Most consumer software even less
And yet, even in consumer gear which is built to a minimum spec budget, core counts, memory capacity, pcie lanes, bus bandwidth, IPC, cache sizes, GPU shaders, NPU TOPS, all increasing year over year.
> systems with built-in FPGAs failed in the consumer market
Talk about niche. I've never met an end user with a use for an FPGA or the willingness to learn what one is. I'd say that has more to do with it. Write a killer app that regular folks want to use that requires one, and they'll become popular. Rooting for you.
You have to root for those hardware designers to have software devs in quantities, actually using what they produce, at scale.
twice as fast, but asked to render 4x the pixels. Do the math
Well you see... I got nothing.
The path nowadays is to use all kinds of upscaling and temporal detail junk that is actively recreating late 90s LCD blur. Cool. :(
AFAIK, this generation has been widely slammed as a failure due to lack of new blockbuster games. Most things that came out were either for PS4, or remasters of said games.
There have been a few decent sized games, but nothing at grand scale I can think of, until GTA6 next year.
There were the little details of a global pandemic and interest rates tearing through timelines and budgets.
This article shows how great a leap there was between previous console generations.
https://www.gamespot.com/gallery/console-gpu-power-compared-...
GTA VI is going to be a showcase on these consoles.
The big jump between 4 and 5 was the NVME SSD and hardware decompression IMO. Load times in a regular PS5 are non existent compared to a PS4, that's the big generational jump.
For graphics, I agree it looks like diminishing returns.
> Or the game engines have become super bloated.
"Bloated" might be the wrong word to describe it, but there's some reason to believe that the dominance of Unreal is holding performance back. I've seen several discussions about Unreal's default rendering pipeline being optimized for dynamic realtime photorealistic-ish lighting with complex moving scenes, since that's much of what Epic needs for Fortnite. But most games are not that and don't make remotely effective use of the compute available to them because Unreal hasn't been designed around those goals.
TAA (temporal anti-aliasing) is an example of the kind of postprocessing effect that gamedevs are relying on to recover performance lost in unoptimized rendering pipelines, at the cost of introducing ghosting and loss of visual fidelity.
TAA isn't a crutch being used to hold up poor performance, it's an optimization to give games anti-aliasing that doesn't suck.
Your other options for AA are
* Supersampling. Rendering the game at a higher resolution than the display and downscaling it. This is incredibly expensive.
* MSAA. This samples ~~vertices~~surfaces more than once per pixel, smoothing over jaggies. This worked really well back before we started covering every surface with pixel shaders. Nowadays it just makes pushing triangles more expensive with very little visual benefit, because the pixel shaders are still done at 1x scale and thus still aliased.
* Post-process AA (FXAA,SMAA, etc). These are a post-process shader applied to the whole screen after the scene has been fully rendered. They often just use a cheap edge detection algorithm and try to blur them. I've never seen one that was actually effective at producing a clean image, as they rarely catch all the edges and do almost nothing to alleviate shimmering.
I've seen a lot of "tech" YouTubers try to claim TAA is a product of lazy developers, but not one of them has been able to demonstrate a viable alternative antialiasing solution that solves the same problem set with the same or better performance. Meanwhile TAA and its various derivatives like DLAA have only gotten better in the last 5 years, alleviating many of the problems TAA became notorious for in the latter '10s.
Yeah. Only problem is that overly aggressive TAA implementations blur the whole frame during camera rotation. The thing that is even better than standard TAA is a combination of TAA and temporal upscaling, called TSR in Unreal. Still better is the same system but performed by an ML model, e.g. DLSS. Though this requires special inference hardware inside the GPU.
In the past, MSAA worked reasonably well, but it was relatively expensive, doesn't apply to all forms of high frequency aliasing, and it doesn't work anymore with the modern rendering paradigm anyway.
ThreatInteractive is an anti-TAA developer/YouTuber. They make a compelling argument against TAA and present an alternative they are working on for Unreal.
Erm your description of MSAA isn't quite correct, it has nothing to do with vertices and doesn't increase vertex processing cost..
It's more similar to supersampling, but without the higher pixel shader cost (the pixel shader still only runs once per "display pixel", not once per "sample" like in supersampling).
A pixel shader's output is written to multiple (typically 2, 4 or 8) samples, with a coverage mask deciding which samples are written (this coverage mask is all 1s inside a triangle and a combo of 1s and 0s along triangle edges). After rendering to the MSAA render target is complete, an MSAA resolve operation is performed which merges samples into pixels (and this gives you the smoothed triangle edges).
> solves the same problem set with the same or better performance
The games industry has spent the last decade adopting techniques that misleadingly inflate the simple, easily-quantified metrics of FPS and resolution, by sacrificing quality in ways that are harder to quantify. Until you have good metrics for quantifying the motion artifacts and blurring introduced by post-processing AA, upscaling, and temporal AA or frame generation, it's dishonest to claim that those techniques solve the same problem with better performance. They're giving you a worse image, and pointing to the FPS numbers as evidence that they're adequate is focusing on entirely the wrong side of the problem.
That's not to say those techniques aren't sometimes the best available tradeoff, but it's wrong to straight-up ignore the downsides because they're hard to measure.
This is a very one-sided perspective on things. Any precomputed solution to lighting comes with enormous drawbacks across the board. The game needs to ship the precomputed data when storage is usually already tight. The iteration cycle for artists and level designers suchs when lighting is precomputed - they almost never see accurate graphics for their work while they are iterating because rebaking takes time away from their work. Game design become restricted to those limitations, too. Can't even think of having the player randomly rearranging big things in a level (e.g. building or tearing down a house) because the engine can't do it. Who knows what clever game mechanics are never thought of because of these types of limitations?
Fully dynamic interactive environments are liberating. Pursuing them in is the right thing to do.
https://www.youtube.com/watch?v=Ed4vNNQwCDU
In principle, Epic's priorities for Unreal should be aligned to a lot of what we've seen in the PS3/4/5 generation as far as over-the-shoulder 3rd person action adventure games.
I mean, look at Uncharted, Tomb Raider, Spider-Man, God of War, TLOU, HZD, Ghost of Tsushima, Control, Assassins Creed, Jedi Fallen Order / Survivor. Many of those games were not made in Unreal, but they're all stylistically well suited to what Unreal is doing.
I agree. UE3 was made for Gears of War (pretty much) and as a result the components were there to make Mass Effect.
I had the pleasure to spend some time with Mark Cerny many years ago. He was honestly one of the most impressive people I have ever met. Down to earth and so, so smart. I also think it speaks volumes for Sony as a company that an American born video game developer (engineer not mba) has such an influential position. They are not insular and respect the craft.
There are colleagues of mines that called me the smartest person they ever met and I feel so stupid, how do you make the most of what you are given?
A few days ago there was a similar message from Xbox, saying that AMD will power it's future hardware project, talking about a strategic alliance and so on.
So, Mark Cerny is contributing on the next Xbox? At the end, today all consoles are basically PCs with different frontends and storefronts (and that is also opening up, starting with xbox but probably PS will follow eventually)
If the Playstation contributions are good enough, maybe RDNA4 -> RDNA5 will be just as good as RDNA3 -> RDNA4. As long as they get the pricing right, anyway.
Excited to see how the software support for UDNA1 works out. Very hopeful we'll see some real competition to Nvidia soon in the datacenter. Unfortunately I think the risk is quite high: if AMD burns developers again with poor drivers and poor support, it's hard to see how they'll be able to shake the current stigma.
Take this with a pinch of salt, but the most recent ROCm release installed out of the box on my WSL2 machine and worked first time with llama.cpp. I even compiled llama.cpp from source with 0 issues. That has never happened ever in my 5+ years of having AMD GPUs. Every other time I've tried this it's either failed and required arcane workarounds, or just not worked entirely (including running on 'real' Linux).
I feel like finally they are turning the corner on software and drivers.
Llama.cpp also has a Vulkan backend that is portable and performant, you don't need to mess with ROCm at all.
Oh yes I know, but "can i compile llama.cpp with rocm" has been my yardstick for how good AMD drivers are for some time.
We've known this for a while, it's an extension of the upscaling and frame generation AMD already worked on in conjunction with Sony for FSR 3 and to a much greater extent FSR 4. Previous articles also have highlighted their shared focus on BVH optimizations
Yes, but what will use it when there are so few games on the platform in the current PS generation?
Who is a better computer architect, Mark Cerny or Anand Shimpi?
Did we ever hear what Anand does at Apple?
Probably Vision Pro?
When will Sony support Vulkan on PS?
Why would they? They have their own (two actually) proprietary graphics APIs: GNM and GNMX.
I'd ask why wouldn't they. Not a fan of NIH and wheel reinvention proponents.
Sony's low level APIs for PS4 and PS5 (it doesn't use GNM) is almost a direct mapping to the hardware with very little abstraction compared to Vulkan/DX12. Vulkan is still very high level compared to what's going on inside the driver. There's no point paying the cost of Vulkan's abstractions when half the point of a game console is to have a fixed hardware target, hence GNM.
It's basically same AMD GPU and you can compile SPIR-V into its machine code optimally without any kind of made up "I'm closer than close to the hardware" secrets. Compiler optimizations are done all the time (see Valve's aco and AMD's llvm based compiler).
Vulkan is more than adequate to handle things for the hardware on the API side so I don't buy the claim that they are somehow "closer than close" to be better.
Because if they write their own they get to own the political/bureaucratic portion of the problem. For better or worse, they don't have to deal with the Kronos Group. They get to optimize their APIs directly against their research with AMD.
That still doesn't make NIH a better approach. NIH is dinosaur idea really when it comes to technology like this.
Why would Vulkan, as opposed to a custom solution designed to target that hardware and games specifically, be a better solution?
If you’re making a PS game you’re already doing tons of bespoke PS stuff. If you don’t want to deal with it there are plenty of pieces of middleware out there to help.
Honestly these “where’s Vulkan” posts on every bit of 3D capable hardware feel like a stupid meme at this point as opposed to a rational question.
Maybe they should just ship DX12. That’s multi-platform too.
Why would Vulkan, as opposed to a custom solution designed to target that hardware and games specifically, be a better solution?
Vulkan came to be as a low-overhead rendering API targeted at 3D video games made originally, as Mantle, by both AMD and DICE.
If anything, I'd see any console or game people involved with this as a bad signal, considering what a travesty Vulkan came to be and that one originated from such a group.
Because it won't tax developers with the need to learn yet another NIH. Same reason any standard exists and makes things easier for those who use it.
Honestly any idea that defends NIH like this belongs with dinosaurs. NIH is a stupid meme, not the opposite of it.
Vulkan is a mess as bad as OpenGL, to the point Khronos has been forced to public admit it didn't went out as planned, is yet again an extension spaghetti, and they are now trying to clean the house with the Vulkan Roadmap introduced at Vulkanised 2025.
That is how good Khronos "design by commitee APIs" end up being.
Vulkan's extension story is a mess. Not surprising from the people that brought you OpenGL.
However please don't undersell what they got right. Because what they got right, they got _very_ right.
Barriers. Vulkan got barriers so absolutely right that every competing API has now adopted a clone of Vulkan's barrier API. The API is cumbersome but brutally unambiguous compared to DirectX's resource states or Metal's hazard tracking. DirectX has a rats nest of special cases in the resource state API because of how inexpressive it is, and just straight up forgot to consider COPY->COPY barriers.
We also have SPIR-V. Again, D3D12 plans to dump DXIL and adopt SPIR-V.
The fundamentals Vulkan got very right IMO. It's a shame it gets shackled to the extension mess.
SPIR-V only exists to reaction of Khronos losing to CUDA and PTX, it is the evolution of SPIR, created when they realised industry didn't really want to write OpenCL in a textual form of C99 dialect, rather bytecode formats targeted by multiple compiler backends, including Fortran ignored by Khronos.
Microsoft adopting SPIR-V as DXIL replacement is most likely a matter of convenience, the format got messy to maintain, is tied to an old fork of LLVM, and HLSL has gotten the industy weight, even favoured over GLSL for Vulkan (which Khronos acknowledge at Vulkanised 2024 not doing any work at all, zero, nada), so why redo DXIL from scratch, when they could capitalize on existing work to target SPIR-V from HLSL.
DXIL has other issues beyond being an orphaned fork of LLVM-IR. LLVM is ill-equipped to represents shaders, and there have been real bugs in the wild because of how not-a-shader-language LLVM-IR is. See this [0] post under the 'The classic subgroup breakage – maximal convergence gone awry' section. Read the whole series really if you haven't, it's a fantastic dive into how awful this stuff is.
DXIL is also a useless intermediate representation because parsing it is so bloody difficult nobody could actually do anything with it. DXIL was, and still is, functionally opaque bytes. You can't introspect it, you can't modify it or do anything of use with it. SPIR-V is dead-simple to parse and has an array of tools built around it because it's so easy to work with.
I don't really see how the OpenCL history is relevant to Vulkan either. Khronos losing the OpenCL game to Nvidia, certainly no thanks to Nvidia's sabotage either, doesn't change that SPIR-V is a much more successful format.
[0] https://themaister.net/blog/2022/04/24/my-personal-hell-of-t...
> Khronos losing the OpenCL game to Nvidia, certainly no thanks to Nvidia's sabotage either, doesn't change that SPIR-V is a much more successful format.
could you elaborate more on this? sounds interesting
DXIL has the DirectX SDK tooling.
OpenCL history is more than relevant, without SPIR, there would not exist SPIR-V.
People love to blame NVidia for AMD and Intel failures pushing OpenCL into the industry, and Google completly ignoring it for mobile devices, pushing their own RenderScript dialect instead, even if Apple would have behave differently regarding their platforms, the other 80% market has completly ignored it.
This comment is just a bunch of bunk, but regardless, Vulkan is the only thing that's a collaborative effort, whether it's a mess or not. NIH proponents have nothing to offer as an alternative.
Market and tooling, industry has usually ignored Khronos APIs when multiple APIs are available to chose from, e.g. Nintendo Switch.
And most of the standard we have now starts with something similar to NIH. Vulkan itself is an offshoot of mantel from AMD. There are valid reason to have a custom api. Especially in domain like game console with hardware with long release cycle, tight performance requirement and legacy (ps4) code to support.
Good, but if it's actually a standard - that's the benefit. If it's not - it's just NIH as in lock-in.
AMD’s next-gen GPUs may have some PlayStation tech inside.
I don't think he is employed by Sony, but work as a consultant for them. So both Sony PS 4/5 and AMD GPUs have his tech inside.
So you're right, though I would never have guessed— in the PS5 hype cycle he gave that deep dive architecture presentation that for all the world looked like he was a Sony spokesperson.
It looks like all of your comments are low-effort summaries like this. What’s going on here? Or is this a bot…
They're summarizing the submissions they're making. All of the summary comments are on their own submissions.