AMD's Freshly-Baked MI350: An Interview with the Chief Architect

chipsandcheese.com

・

128 points

・

pella

・

12 days ago

77 comments

pella ・ 12 days ago

FP6:

  "Alan: Sure, yep, so one of the things that we felt like on MI350 in this  timeframe, that it's going into the market and the current state of AI... we felt like that FP6 is a format that has potential to not only be used for inferencing, but potentially for training. And so we wanted to make sure that the capabilities for FP6 were class-leading relative to... what others maybe would have been implementing, or have implemented. And so, as you know, it's a long lead time to design hardware, so we were thinking about this years ago and wanted to make sure that MI350 had leadership in FP6 performance. So we made a decision to implement the FP6 data path at the same throughput as the FP4 data path. Of course, we had to take on a little bit more hardware in order to do that. FP6 has a few more bits, obviously, that's why it's called FP6. But we were able to do that within the area of constraints that we had in the matrix engine, and do that in a very power- and area-efficient way.

treesciencebot ・ 12 days ago

the main question is going to be software stack. NVIDIA is already shipping NVFP4 kernels and perf is looking good. It took a really long time after MI300X's that the FP8 kernels were OK (not even good, compared to almost perfect FP8 support in NVIDIA side of things).
I will doubt that they will be able to reach %60-70 of the FLOPs in majority of the workloads (unless they hand craft and tune a specific GEMM kernel for their benchmark shape). But would be happy to be proven wrong, and go buy a bunch of them
- pella ・ 12 days ago
  
  (related)
  Tinygrad:
  "We've been negotiating a $2M contract to get AMD on MLPerf, but one of the sticking points has been confidentiality. Perhaps posting the deliverables on X will help legal to get in the spirit of open source!" "Contract is signed! No confidentiality, AMD has leadership that's capable of acting. Let's make this training run happen, we work in public on our Discord.
  " https://x.com/__tinygrad__/status/1935364905949110532
  
  LeonM ・ 12 days ago
  ・ 4 more
  
  It still amazes me that George/Tinycorp somehow seems to get AMD on board every time, and being blissfully unaware that they are a very small player. See for example top comment here [0].
  Don't get me wrong, I think it's impressive what he achieved so far, and I hope tiny can stay competitive in this market.
  [0] https://news.ycombinator.com/item?id=36193625
  
  roenxi ・ 12 days ago
  
  That top comment doesn't seem to have engaged completely with the context here. AMD fumbled trillions of dollars of value creation by mis-identifying what their hardware was for. Or perhaps it is more correct to say by being too dogmatic about what their hardware is for. They weren't in a position to be picky. They had a choice - they could continue making trillion-dollar mistakes until their board got sacked and the exec team replaced. Or they could maybe listen to some of the people who were technically correct regardless of their size in the market.
  George is just some dude and I doubt AMD paid him much attention anywhere through this saga, but AMD had screwed up to the point where he could give some precise commentary about how they'd managed to duck and weave to avoid the overwhelming torrent of money trying to rush in and buy graphics hardware. They should make some time in their busy schedules to talk with people like that.
  
  imtringued ・ 12 days ago
  
  People get on board with George Hotz because they share the frustration of using ROCm on consumer GPUs, where the experience has been insultingly dreadful to the point where I decided to postpone buying new AMD GPUs for at least a decade.
  I'm not quite sure why he decided to pivot to datacenter GPUs where AMD has shown at least some commitment to ROCm. The intersection between users of tinygrad and people who use MI350s should essentially be George himself and no one else.
  
  ryao ・ 12 days ago
  
  Most of those willing to work with AMD are very small players (with some notable exceptions). They are likely hopeful that the small players will grow.
- lhl ・ 12 days ago
  
  For anyone interested in tracking max achievable matmul FLOPS for hardware and unaware, I highly recommend tracking Stas Bekman's mamf-finder results: https://github.com/stas00/ml-engineering/tree/master/compute...
kristianp ・ 12 days ago

Will 1.58 bits be in the MI400? Or is it not established as a widely useful technology yet?
See https://arxiv.org/abs/2402.17764

behnamoh ・ 12 days ago

Does this also ship only in x8 batches? I really liked MI300 and could afford one of them for my research, but they only come in batches of x8 in a server rack, so I decided to buy an RTX Pro 6000.

jiggawatts ・ 12 days ago

Of course not.
AMD stubbornly refuses to recognise the huge numbers of low- or medium- budget researchers, hobbyists, and open source developers.
This ignorance of how software development is done has resulted in them losing out on a multi-trillion-dollar market.
It's incredible to me how obstinate certain segments of the industry (such as hardware design) can be.
- rfv6723 ・ 12 days ago
  
  These ppl are very loud online, but they don't make decisions for hyperscalers which are biggest spenders on AI chips.
  AMD is doing just fine, Oracle just announced an AI cluster with up to 131,072 of AMD's new MI355X GPUs.
  AMD needs to focus on bringing rack-scale mi400 as quickly as possible to market, rather than those hobbyists always find something to complain instead of spending money.
  
  behnamoh ・ 12 days ago
  ・ 9 more
  
  > these people
  we're talking about the majority of open source developers (I'm one of them). if researchers don't get access to hardware X, they write their paper using hardware Y (Nvidia). AMD isn't doing fine because most low level research on AI is done purely on CUDA.
  
  Certhas ・ 12 days ago
  ・ 4 more
  
  So nVidia has a huge software lead because of open source developers like you? Or because people employed by nVidia write closed source high performance drivers and kernels? Or because the people employed by Meta and Google that wrote Torch and Tensorflow built it on nVidia?
  I am really sympathetic to the complaints. It would just be incredibly useful to have competition and options further down the food chain. But the argument that this is a core strategic mistake makes no sense to me.
  
  imtringued ・ 11 days ago
  
  You don't have to beg for good software quality from Nvidia. Everything else is built on top of that foundation. That's all there is to it.
  
  regularfry ・ 11 days ago
  
  Nit: just writing Torch/TF isn't what made the difference. Having them adopted by a huge audience outside those orgs is, and that's bottlenecked on the hardware platform choice.
  AMD has demonstrably not even acknowledged that they needed to play catch-up for a significant chunk of the last 20 years. The mistake isn't a recent one.
  
  wisty ・ 11 days ago
  
  Also theano, keras ... nvidia made it easy to develop on so that's what people use.
  
  motorest ・ 12 days ago
  ・ 2 more
  
  > if researchers don't get access to hardware X, they write their paper using hardware Y (Nvidia).
  There are plenty of research institutions that can easily spend >$250k on computational resources. Many routinely spend multiples of that volume.
  They'll be fine.
  
  impossiblefork ・ 11 days ago
  
  Many institutions don't.
  Look at China. A couple of years ago people thought people in China weren't doing good AI research, but the thing is, there's good AI research from basically everywhere-- even South America. You can't assume that institutions can spend >$250k on computational resources.
  Many can, but many can't.
  
  qualifiedeephd ・ 12 days ago
  ・ 2 more
  
  Serious researchers use HPC clusters not desktop workstations. Currently the biggest HPC cluster in North America has AMD GPUs. I think it'll be fine.
  
  pjmlp ・ 12 days ago
  
  Before they became serious researchers they were once upon a time students learning with what their laptops were capable of.
  
  uniclaude ・ 12 days ago
  ・ 2 more
  
  Neither their revenue nor their market share in the space looks like just fine. What exactly in trailing the market for years is “just fine”?
  AMD is very far behind, and their earnings are so low that even with a nonsensical pe ratio they’re still less than a tenth of nvidia. No, they are not doing anywhere near fine.
  Are hobbyists the reason for this? I’m not sure. However, what AMD is doing is clearly failing.
  
  regularfry ・ 11 days ago
  
  A big chunk of NVidia's current price is a reflection of lacking meaningful competition. So straight comparison isn't quite fair: if AMD started to do better, the gap would shrink from both ends.
  
  creato ・ 12 days ago
  ・ 3 more
  
  When you design software for N customers, where N is very small, and you expect to hold each customers' hand individually, the software is basically guaranteed to be hot garbage that doesn't generalize or actually work except in exactly the use cases you supported (there are exceptions to this, but it requires having exceptional software engineers and leaders that care about doing things correctly and not just closing the next ticket, and in my experience, they are extremely rare).
  If you design software for N00000 customers, it can't be shit, because you can't hold the hands of that many people, it's just not possible. By intending to design software for a wide variety of users, it forces you to make your software not suck, or you'll drown in support requests that you cannot possibly handle.
  
  wmf ・ 12 days ago
  ・ 2 more
  
  Now imagine you don't have the resources to satisfy N00000 customers. What do you do?
  
  exceptione ・ 11 days ago
  
  Start selling ice creams. :)
  Honestly, if they "don't have the resources to satisfy N00000 customers", they better get them. That will teach them in the hard way to work differently.
  
  almostgotcaught ・ 12 days ago
  ・ 12 more
  
  > These ppl are very loud online, but they don't make decisions for hyperscalers which are biggest spenders on AI chips.
  this guy gets it - absolutely no one cares about the hobby market because it's absolutely not how software development is done (nor is it how software is paid for).
  
  pstuart ・ 12 days ago
  
  The hobby market should be considered as a pipeline to future customers. It doesn't mean AMD should drop everything and cater specifically to them, but they'd be foolish to ignore them altogether.
  
  justahuman74 ・ 12 days ago
  ・ 10 more
  
  The hobby market is how you get 'market default' N years later
  
  wmf ・ 12 days ago
  
  This probably does work for the first mover. It's not clear that it can work for the underdog.
  
  almostgotcaught ・ 12 days ago
  ・ 8 more
  
  citation please
  
  orbital-decay ・ 12 days ago
  ・ 7 more
  
  Nvidia's success.
  
  almostgotcaught ・ 12 days ago
  ・ 6 more
  
  You think the company that supplies every hyperscalers is market leader because of the hobbyist segment? Lol
  
  undefined ・ 11 days ago
  
  [deleted]
  
  reliabilityguy ・ 12 days ago
  ・ 4 more
  
  So, what is the reason only Nvidia supplies the hyperscalers?
  
  almostgotcaught ・ 11 days ago
  ・ 3 more
  
  they have lots of money and they use that money to pay software engineers (their employees...) to write quality software.
  
  orbital-decay ・ 11 days ago
  
  Water is wet. Please check the history of their software stack and why it always was superior to alternatives when they didn't have a lot more money than ATI/AMD. The reason they power hyperscalers is because they catered to enthusiasts and academy researchers attempting to use their GPUs for general purpose computations in early 2000s, since GeForce 3. Then they used that experience to build CUDA which simply worked, and quickly gained mindshare. People have used their software for all imaginable purposes, which was a major factor behind their improvements and eventually becoming market leaders as killer applications for GPGPU have been found (simulation and then AI). This experience is not replicable even with dogfooding, which AMD also doesn't seem to do.
  
  reliabilityguy ・ 11 days ago
  
  so, all these years AMD just failed to hire good engineers and refused to pay well?
- gdiamos ・ 12 days ago
  
  startups and researchers are broke, just like Geoff Hinton in 2006 - https://blog.waqasrana.me/assets/papers/hinton2006.pdf
  
  behnamoh ・ 12 days ago
  ・ 3 more
  
  no we're not broke! we constantly write grants and receive funding from various sources. guess what hardware we recommend the University to purchase? it's 99.9% Nvidia, and sometimes Mac Studio just to play with MLX.
  
  gdiamos ・ 12 days ago
  ・ 2 more
  
  I mean broke compared to Meta or OpenAI.
  AI research used to be fringe and not well funded.
  Back in those days, 99.9% of hardware was Xeon.
  
  ryao ・ 12 days ago
  
  It has gone through many boom and busy cycles. If you go far back enough, it was very well funded. In particular, I recall reading about the US government investing 1 to 2 billion dollars during the Cold War into AI research to translate Russian into English. It had some very impressive demos on preselected Russian texts that had justified the investments. However, it failed to yield results on arbitrary texts. The translation problem has only been mostly solved in recent years.
- naveen99 ・ 12 days ago
  
  There is no mass middle market… it’s volume or luxury… middle management is for taxes.
latchkey ・ 11 days ago

It isn't 8x batches. It is 8 OAMs on a UBB. The UBB is what enables the 8x to communicate with each other over infinity fabric.
If you don't need 8, then that's exactly why we offer 1xMI300x VM's.
- ryao ・ 10 days ago
  
  A number of people want to purchase their own hardware, not rent cloud hardware. I recently purchased a RTX PRO 6000 for the same reason, despite having the option of renting a B200 VM for $1.49 an hour from DeepInfra until the end of June.
  
  latchkey ・ 10 days ago
  ・ 6 more
  
  True, but as time goes on, it will become a wider and wider gap between what is deployed in DC’s and what you can run at home.
  We see it now with 8x UBB and it will get worse with direct liquid cooling and larger power requirements. Mi300x is 700w. Mi355 is 1200w. Mi450 will be even more.
  Certainly amd should make some consumer grade stuff, but they won’t stop on the enterprise side either. Your only option to get super computer level compute, will be to rent it.
  
  ryao ・ 10 days ago
  ・ 5 more
  
  The higher power requirements of datacenter GPUs are mostly from running hardware well past a reasonable point on the efficiency curve to eke slightly higher generational increases in performance. Nvidia for example has three versions of the RTX PRO 6000. One runs at 600W while the other two run at 300W. The main differences are the power target and the cooling solutions. If I recall correctly, the performance difference between them is less than 20%, despite a 50% reduction in power consumption, for what is effectively the same hardware. This can be confirmed by changing the power target of the 600W version to 300W and benchmarking the before and after. Plenty of people have made similar observations of AMD’s hardware.
  That said, I am confident that Nvidia will continue serve those of us who want our own hardware.
  
  latchkey ・ 10 days ago
  ・ 4 more
  
  They all bin their chips after testing. 355x is 35% faster than 300x and uses almost 2x more power.
  But 355 has fp4/6 added in, which until udma comes out, likely won’t get emulated.
  It is fine if you dont need the features of newer hardware, but if you do… then desktop won’t help you.
  
  ryao ・ 10 days ago
  ・ 3 more
  
  Your comment does not make much sense to me. The 355x and 300x are two different chips, not binned versions of one another. The RTX PRO 6000 has fp4/fp6 support too, so there is no need to use datacenter exclusive hardware to get that.
  
  latchkey ・ 10 days ago
  ・ 2 more
  
  I wasn’t referring to them being binned to each other. Oh nice, I didn’t know that about the 6k.
  
  ryao ・ 10 days ago
  
  The consumer 50 series GPUs also have FP4/FP6 support. Nvidia talks about it a little here:
  https://images.nvidia.com/aem-dam/Solutions/geforce/blackwel...
  FP4 is given a full page advertising it while FP6 support in “RTX Blackwell” is a footnote.

teleforce ・ 12 days ago

This 8-combo MI350 is a beauty with 2304 GB VRAM of HMB3E memory on each UBB [1].

[1] This is the AMD Instinct MI350:

https://www.servethehome.com/this-is-the-amd-instinct-mi350/

latchkey ・ 12 days ago

I've got the MI300x and I can't wait to deploy a bunch of the MI355's.

jonfromsf ・ 12 days ago

NVDAs advantage is software, not just hardware. Would be amazing to have a competitive market but better hardware won't be enough to make it happen.

tedunangst ・ 12 days ago

A solid 40% of George's questions were deemed great. (Not counting some fluff like what's your job.)

AbuAssar ・ 12 days ago

If MI350 employs CDNA, which is based on the VEGA (GCN) architecture, does that imply that MI400, when introduced next year, will skip the 2020 GCN and directly transition to RDNA 5 equivalent?

adrian_b ・ 12 days ago

There will be no RDNA 5, but a unified UDNA, replacing both CDNA and RDNA.
AMD has not disclosed how they will achieve the unification, but it is far more likely that the unified architecture will be an evolution of CDNA 4, i.e. an evolution of the old GCN, than an evolution of RDNA, because basing the unified architecture on CDNA/GCN, will create less problems in software porting than basing it on RDNA 4 or 3. The unified architecture will probably take some features from RDNA only when they are hard to emulate on CDNA.
While the first generation of RDNA has been acclaimed for having a good performance increase in games over the previous GCN-based Vega, it is not clear how much of that performance increase was due to RDNA being better for games and how much to the fact that the first RDNA GPUs happened to have double-width vector pipelines in comparison with the previous GCN GPUs, thus double throughput per clock cycle and per CU (32 FP32 operations/cycle vs. 16 FP32 operations/cycle).
It is possible that RDNA was not really a better architecture, but omitting some of the hardware that was rarely used in games from GCN allowed the implementation of the wider pipelines that were more useful for games. So RDNA was a better compromise for the technology available at that time, not necessarily better in other circumstances.
- trynumber9 ・ 12 days ago
  
  I heard the opposite. The next is gfx13 and that it is more like RDNA with more bolted on. Which makes sense given the version numbers. MI350 is still gfx943 or gfx950. RX 9070 XT is gfx1201.
pella ・ 12 days ago

2026 - MI400X - CDNA 5 - UALink/IF - Helios - HBM Bandwidth: 1,400 TB/s
https://www.tomshardware.com/pc-components/gpus/amd-says-ins...
rfv6723 ・ 12 days ago

RDNA is a dead-end.
AMD went down the wrong path by focusing on traditional rendering instead of machine learning.
I think future AMD consumer GPUs would go back to GCN.
almostgotcaught ・ 12 days ago

it's all just called gcn now
https://llvm.org/docs/AMDGPUUsage.html#id38
- adrian_b ・ 12 days ago
  
  The identification of the AMD GPU architectures has always been extremely confusing, with tons of different names meaning the same thing and with some names, like GCN, used for several very different things.
  The table linked by you is good for revealing the meaning of a part of the many AMD code names.

deadbabe ・ 12 days ago

Will AMD catch up to Nvidia?

mobilio ・ 12 days ago

If they improve software quality and providing some low budget versions then - Yes.
AzzyHN ・ 12 days ago

On the consumer side, almost certainly not. Nvidia is a HUGE brand name, it doesn't matter how good and cheap AMD makes their consumer GPUs, people will buy Nvidia GPUs for the brand and prebuilts will stick with Nvidia for the name.
For AI chips... also probably not, unless AMD can compete with CUDA (or CUDA becomes irrelevant)
- jillesvangurp ・ 12 days ago
  
  Actually both Xbox and Playstation use AMD GPUs; and so does the Steam Deck. So there's that. For the narrow niche of gaming PCs, I think there are a lot of kids buying what they can afford and getting creative about what works. AMD isn't doing horrible in that market either.
  And for AI, CUDA is already becoming less relevant. Most of the big players use chips by their own designs: Google has its TPUs, Amazon has some in house designs, Apple has it's own CPU/GPU line and doesn't even support anything nvidia at this point, MS do their own thing for Azure, etc.
  You are basically making the Intel will stay big because Intel is big for Nvidia. Except of course that stopped being true for Intel. They are still largish. But a lot of data centers are transitioning to ARM CPUs. They lost Apple as a customer. And there are now some decent windows laptops using ARM CPUs as well.
  
  pjmlp ・ 11 days ago
  
  People learning on their laptops, into their way of becoming future researchers, care about what software they can get, regardless of closed system proprietary game consoles, and hyperscalers server farms.
- frje1400 ・ 12 days ago
  
  > On the consumer side, almost certainly not. Nvidia is a HUGE brand name, it doesn't matter how good and cheap AMD makes their consumer GPUs, people will buy Nvidia GPUs for the brand and prebuilts will stick with Nvidia for the name.
  I think that AMD could do it, but they choose not to. If you look at their most recent lineup of cards (various SKUs of 9070 and 9060), they are not so much better than Nvidia at each price point that they are a must buy. They even released an outright bad card a few weeks ago (9060 8 GB). I assume that the rationale is that even if they could somehow dominate the gamer market, that is peanuts compared to the potential in AI.
pjmlp ・ 12 days ago

Not for me, I was burned twice buying laptops with AMD only to battle with their software, and even the FOSS drivers on GNU/Linux weren't that great versus the Windows experience.
While on Windows it has been hit and miss with their SDKs and shader tooling, anyone remembers RenderMonkey?
So NVidia it is.
- gavinray ・ 12 days ago
  
  Yeah, sorry, I'm in the same boat.
  I'm team AMD for CPU (currently waiting for consumer X3D laptops to become reasonably priced).
  But for GPU, if only for the "It Just Works" factor, I'm wedded to NVIDIA for the foreseeable future.
z3ratul163071 ・ 12 days ago

and as of lately, I really think AMD exists only for NVidia not to get slapped with antitrust lawsuits.
they played that part beautifully in the past decades for Intel
- naveen99 ・ 12 days ago
  
  Beating intel is not just existing for cya against antitrust lawsuits.
  
  happycube ・ 12 days ago
  
  Ten years ago nobody would belive that AMD would have over double Intel's market cap in 2025. And at least somewhat surprised that nVidia would be about 10x that.
  
  z3ratul163071 ・ 12 days ago
  
  [flagged]
wmf ・ 12 days ago

Yes, if they can ship on time.
z3ratul163071 ・ 12 days ago

the only way for them to have any chance at catch up is to fire all the software VPs and all SW middle management, and 90% of the engineers and build the software team from ground up.
cause the team they have the last decade is clearly retarded.
zombiwoof ・ 12 days ago

They don’t care to catch up.