Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Hardware
Tom’s Hardware
Technology
Mark Tyson

AMD's struggle with unbootable Epyc Naples and Rome prototype CPUs revealed in its new Advanced Insights series

AMD Advanced Insights Ep1.

AMD has published the first episode of a new YouTube video series dubbed Advanced Insights, with AMD CTO Mark Papermaster serving as the host. This episode centered on AMD’s disruptive entry and seemingly inexorable growth in the data center. However, it hasn’t been all plain sailing, as the two execs publicly discussed some early teething troubles with booting Epyc Naples and Rome prototype chips for the first time.

During the series' first outing, Mark Papermaster chatted with the EVP & GM of the Data Center Solutions Business Unit at AMD, Forrest Norrod. The first guest was there to talk about disrupting markets, and AMD’s last nine or so years in the server business were used as an illustration of the power of disruptive technology.

Unbootable Epyc CPUs

AMD didn’t get to where it is today in the server market without experiencing a few speed bumps. Norrod recalled that when the first Epyc Naples chip samples arrived, they couldn’t boot up. It isn’t clear from the interview what part of the Epyc chip caused this immediate and drastic hurdle in the labs. However, the initially flustered engineers managed to get things ticking along after applying some “ingenuity and perseverance.”

When the first Epyc Rome chips arrived in AMD’s test labs, the engineers were again faced with a no-boot problem. This time, an issue in the early chip sample’s memory access was the boot-hindering issue. Again, frantic and highly technical work managed to get the chip into a testable booting state.

When drastic bugs like this are discovered, Norrod says he applies a 72-hour rule to avoid any knee-jerk reactions. People still get busy firefighting issues, but a few days later, things typically look better, and a clearer strategy can be devised, reckons the 35-year seasoned exec. If you are lucky, after 72 hours, sometimes the AMD engineers will have fully solved the issues.

(Image credit: AMD)

Zen differentiation through the generations

Throughout the Advanced Insights video, the discussion highlights Zen-based disruptions to the server market over generations. About nine years ago, when the Zen family was new, AMD dearly wanted a competitive CPU core – plus some differentiation. With Epyc Naples, the first Zen core CPUs for data centers were delivered, and they brought with them a boost in memory channels, I/O, and cores. Epyc Naples lived up to HPC expectations, which was hugely important, asserted Papermaster.

Subsequently, Epyc Rome was disruptive as AMD’s first chiplet-based server processor. The key to its success was choosing the right semiconductor play for each chiplet, the discussion implied. On the topic of chiplets, Norrod was keen to praise Sam Naffziger as “the godfather of chiplets.” 

Chiplets allowed AMD to keep scaling, embrace new tech quicker, and resolve non-uniform memory access (NUMA) issues – confounding AMD’s critics. Memory and I/O have to scale with core count to “feed the beast,” stressed Papermaster in agreement with Norrod. Infinity Fabric also played an important part here.

Epyc Milan was characterized as yet another inflection point. With this new generation, AMD says it put poor single-thread comparisons behind it, and Norrod is confident that AMD secured “leadership across the board” with this generation. Now, AMD could present to customers with evidence of a predictable, trustable roadmap of a company that would stay in the business for the long haul.

Other disruptions that Papermaster and Norrod reckon have been behind the continued success of AMD Epyc processors and servers include:

  • Security – confidential computing for servers. Norrod says he first heard about this differentiator highlighted in discussions about AMD’s game console chips. Introducing the new (at the time) PlayStation and Xbox chips, engineers talked about compelling security features without performance impacts, and without software modifications, prompting Norrod to declare “holy cow.”
  • TCO – Norrod asserts that 80% of customers “should be on a single socket [servers] today.” Considering workloads, this is correct, but fighting human nature with education is quite a grind. The AMD execs reckon it is only a matter of time, though, as the Epyc CPUs are claimed to have big TCO benefits.

Last but not least, the episode couldn’t be complete without some mention of artificial intelligence. AI is disruption – next, reckon the AMD execs. Papermaster said that he has “never been so excited than by the opportunity I see right now.” His excitement apparently stems from AMD’s depth and breadth of talent and experience. 

Norrod interjected to remind viewers that AI processing presents a massive math problem, requiring enormous matrix and vector calculations, reliant on the best GPU, memory, IO, network, CPU, and storage technologies available to be competitive. “There is no better company than AMD to take all that on,” concluded Norrod.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.