It’s the beginning of a brand new period of competitors. Today, Intel revealed its debut Arc GPUs, heralding its long-teased entry into discrete shopper graphics playing cards. Watch out, Nvidia and AMD. Chipzilla’s within the fray now, fueled by its new Xe HPG (High-performance gaming) GPU structure.
Intel took an uncommon (however strategically good) method to Arc’s debut, rolling out Arc 3 graphics for modestly priced moveable laptops. It lets the corporate leverage its substantial strengths in notebooks and software program help slightly than going blow-for-blow in gaming body charges on the desktop, the place Nvidia and AMD stand agency. We’ve coated the Arc 3 laptop GPU reveal and Intel’s killer features in a separate piece that explains what on a regular basis people ought to anticipate from this new breed of laptop computer. There’s some fairly compelling stuff, together with key “Deep Link” options that add eye-opening capabilities if you pair an Intel Arc GPU with an Intel Core processor.
That’s not the purpose of this text although. As a part of the reveal, Intel Fellow Tom Peterson additionally offered the press with a high-level overview of the Xe HPG structure underpinning these Arc “Alchemist” graphics playing cards. It’s our first glimpse on the nuts and bolts powering Intel’s discrete graphics ambitions.
So, as we did with Nvidia’s Ampere and AMD’s RDNA 2 architectures, right here’s a short technical explainer on the innards of Intel Arc’s Xe HPG chips. Much the best way Nvidia and AMD use completely different applied sciences and terminologies for his or her designs, Intel’s Arc chips depend on some proprietary ideas (together with a brand new tackle clock speeds that wants some explaining). That makes it troublesome to match Arc towards rival GPU architectures—Intel doesn’t even use frequent phrases like ROPs and TMUs—however by the point we’re carried out right here, you’ll have a strong high-level understanding of what makes Xe HPG tick. Let’s dig in.
Meet Xe HPG
Intel
For Intel, Xe HPG “render slices” comprise the spine of each Arc GPU. Intel’s laptop computer and desktop Arc choices may be scaled up or down as wanted to suit completely different market wants, however these render slices are at their coronary heart, containing devoted ray tracing models, rasterizers, geometry blocks, and the elemental constructing block for Arc, the Xe Cores themselves. Xe XPG can scale all the best way as much as eight render slices in Arc cellular GPUs, represented by the flagship Arc A770M GPU in laptop computer kind.
Each render slice accommodates 4 Xe cores and 4 ray tracing models, together with all the opposite bits crucial for operating a contemporary GPU. These render slices are fully DirectX 12 Ultimate compliant, which means Intel’s Arc GPUs can deal with ray tracing, Variable Rate Shading, Mesh Shading, and all the opposite options related to that normal.

Intel
Let’s go deeper and take a peek on the Xe cores themselves. Each Xe core (once more, there are 4 per render slice) is comprised of three key bits: 16 256-bit “XVE” vector engines that deal with extra conventional rasterization duties, 16 1024-bit “XMX” matrix engines that deal with machine studying duties (just like the tensor cores in Nvidia’s rival RTX GPUs), and 192KB of shared L1/SLM cache. That cache can be utilized to carry duties throughout compute workloads, or shaders and textures whereas gaming.

Intel
The largest firms in PC gaming could also be betting huge on ray tracing being the way forward for graphics, however conventional rendering stays king for now. Each Xe Vector Engine features a devoted floating level (FP) execution port to deal with conventional shading duties, together with a shared INT/EM port that may sort out integer-based duties on the identical time.
Nvidia launched concurrent FP/INT pipelines with its RTX 20-series “Turing” architecture to maintain integer duties from clogging up the FP32 pipeline, and it’s turn into the norm since. “When Nvidia examined how real-world games behaved, it found that for every 100 floating point instructions performed, an average of 36 and as many as 50 non-floating point instructions were also processed, jamming things up,” we wrote in 2018. “The new integer pipeline handles these additional directions individually from and concurrently with the FP32 pipeline. Executing the 2 duties on the identical time leads to an enormous velocity enhance.”

Intel
Intel’s devoted “XMX” matrix engines hook into the vector engines in every Xe Core. They’re broadly just like Nvidia’s RTX tensor cores, designed to enormously speed up machine studying duties. These are the bits that unlock the potential of XeSS, Intel’s rival to Nvidia’s vaunted DLSS upsampling, in addition to different particular sauce options like Hyper Compute and the digital digicam characteristic in Intel’s new Arc Control command middle. (Again, learn our Arc laptop computer GPU reveal protection for deeper perception into these consumer-level options.)

Intel
When tapped by appropriate software program (corresponding to a sport with XeSS or an app that helps Hyper Compute), the XMX core’s 4-deep systolic array can calculate as much as 256 multiply accumulate (MAC) operations per clock for INT8 inferencing, a large enhance over the 64 ops/clock provided by trendy GPUs with DP4a {hardware} on board, and the 16 ops/clock supported by older GPUs.
Intel’s XeSS helps a fallback mode to run on rival Nvidia and AMD graphics playing cards that lack XMX cores, defaulting to DP4a {hardware} as an alternative. This image illustrates very nicely why Intel expects XeSS to run a lot, a lot sooner on Arc GPUs with XMX {hardware} inside.

Intel
Each Xe Core options 16 complete Vector and Matrix engines, with pairs of every operating in lockstep, in a position to run FP, INT, and XMX duties all on the identical time. Arc GPUs may be saved very, very busy certainly.

Intel
Intel has all the time been pleased with its media engines, spearheaded by the lightning-fast QuickSync know-how, and the Xe XPG’s media engine is not any completely different. It consists of all the trendy capabilities you’d anticipate in a graphics chip—varied 8K HDR encode and decode help, HEVC, VP9, you identify it—but additionally one huge inclusion that no different chip (CPU or GPU) provides: hardware-accelerated AV1 encoding.
The extremely environment friendly next-generation video normal was created by a consortium of business giants and is quickly shifting in direction of changing into the norm, and trendy desktop GPUs help AV1 decoding that may show you how to watch 8K movies with out your system setting itself on fireplace, however till now you wanted to make use of software program alone to really create AV1 movies. Intel says that the hardware-accelerated AV1 creation unlocked by Arc is 50 occasions sooner than software program encodes, or it’s able to delivering a lot clearer streaming visuals on the identical bitrate as different encoders.
Paired with the Hyper Encode characteristic provided in all-Intel laptops as a part of the corporate’s Deep Link suite, which leverages the media engines in each the CPU and GPU slightly than one or the opposite, Arc-based techniques might show terribly compelling for video creators (if gaming efficiency is as much as snuff, after all).
Xe HPG show engine

Intel
The Xe HPG show engine stays constant throughout the Arc GPU stack, which means each Arc graphics card provides the identical video output capabilities (although the precise port configuration will fluctuate by mannequin). Don’t anticipate good body charges in case you truly attempt gaming on a pair of 8K screens, but it surely’s good to know Arc will help it if you’d like all of the pixels in your productiveness duties!
Real-world Arc A-series laptop computer GPUs

Intel
Let’s take a second to deliver all this technical speak again to the sensible realm. Intel cobbled collectively a bunch of Xe cores and render slices right into a pair of devoted Arc “Alchemist” GPUs for the cellular market: the higher-end ACM-G10, and the extra modest ACM-G11, which is able to seem within the debut Arc 3 laptops launching at present.

Intel
From there, these GPUs may be sliced and diced to fulfill completely different market wants. Here’s how the primary technology of Arc graphics for laptops shakes out: Arc 3 laptops launch at present, with Arc 5 and 7 laptops anticipated to launch someday early this summer time.
Xe HPG graphics clock speeds
Something may need jumped out at you in these laptop computer GPU spec charts: their ultra-low clock speeds. In an period the place Nvidia’s GPUs push 2GHz and a few AMD GPUs clear 2.5GHz, seeing Intel’s Arc topping out at 1650MHz and going as little as 900MHz is a tad eye-raising. Clock speeds between rival graphics manufacturers aren’t as clear lower as they appear, nonetheless.

Intel
AMD’s “Game Clock” for Radeon GPUs isn’t the identical as Nvidia’s “Boost Clock,” as I’ve explained before. Intel is utilizing yet one more metric for its Arc GPUs, dubbed “Graphics Clock.” Petersen outlined Intel’s Graphics Clock as the typical clock velocity for a typical workload that individual GPU was supposed for (so gaming for He XPG and certain compute duties for workstation playing cards, for instance). If you have a look at the laptop computer GPU charts above, you’ll additionally see a spread of TDPs outlined for every; the Graphics Clock relies off the bottom accessible TDP. In different phrases, Intel’s Graphics Clock basically represents virtually a worst case situation for Arc GPUs.

Intel
All that mentioned, graphics cores can run at completely different speeds relying on how arduous they’re being pushed—they’ll hit a lot greater velocity in 2D retro video games and far decrease speeds in advanced trendy video games that hit each a part of the Xe Core and Render Slice, for instance. And wattage could make a large distinction to efficiency as nicely; as we’ve seen with Nvidia’s cellular GeForce choices, pumping extra juice right into a GPU can assist propel a lower-tier GPU previous a low-watt model of an ostensibly stronger sibling.
It’s additionally value noting that clock velocity isn’t every thing. In the identical firm’s structure, sooner is usually higher—a 2GHz GeForce GPU can be sooner than a 1.5GHz one, say. But AMD’s desktop Radeon RX 6500 XT lags behind its siblings regardless of packing a ludicrously quick 2.8GHz clock velocity. Raw clock velocity features are removed from the one strategy to drive sooner efficiency, as AMD’s Robert Hallock not too long ago defined on our Full Nerd podcast. That firm’s Ryzen 7 5800X3D processor truly noticed huge gaming efficiency features by dropping clock speeds and plopping an enormous slab of cache atop the chip.
It’s difficult, is what I’m saying. Don’t look too deeply into the clock speeds for Intel’s Arc GPUs till laptops and desktop graphics playing cards wind up within the fingers of reviewers.
But wait, there’s extra!

Intel
And that about does it for our tour of Intel’s Xe HPG structure. The firm saved issues fairly excessive stage for at present’s mobile-centric reveal, however we’d anticipate to see a whitepaper with extra particulars launched the nearer we get to the arrival of Arc 5 and 7 laptops in early summer time, and Arc desktop graphics playing cards someday within the second quarter.
If all this discuss matrix engines and media encoders received you sizzling and bothered, you’ll want to try our separate protection of the Arc 3 laptop GPU launch for a extra sensible have a look at what Intel is definitely doing with all these {hardware} options. Those Deep Link capabilities could possibly be some mighty scrumptious particular sauce certainly.
Now, all that’s left to do is anticipate opinions.