It’s the beginning of a brand new period of competitors. Today, Intel’s debut Arc A770 and A750 GPUs had their curtain drawn totally again, heralding the corporate’s long-teased entry into discrete consumer graphics cards. Watch out, Nvidia and AMD. Chipzilla’s within the fray now, fueled by its new Xe HPG (High-performance gaming) GPU structure.
Intel took an uncommon (however strategically sensible) method to Arc’s debut, initially rolling out Arc 3 graphics for modestly priced moveable laptops, earlier than introducing a equally modest Arc A380 desktop GPU in China this summer time. Doing so allowed Intel to leverage its substantial strengths in notebooks and software program assist quite than going blow-for-blow with Nvidia and AMD on the desktop, and let the corporate spend months offering some much-needed driver polish.
We’ve lined the Arc 3 laptop GPU reveal and Intel’s killer features in a separate piece that explains what on a regular basis of us ought to anticipate from this new breed of laptop computer. And now, we all know how Arc 7 desktop graphics cards perform as effectively. (Spoiler alert: Sometimes it smashes, and generally it stutters—actually, should you don’t have PCIe Resizable BAR enabled).
That’s not the purpose of this text although. As a part of the varied reveals, Intel Fellow Tom Peterson gave the press a high-level overview of the Xe HPG structure underpinning these Arc “Alchemist” graphics playing cards, offering a glimpse on the nuts and bolts powering Intel’s discrete graphics ambitions.
So, as we did with Nvidia’s Ampere and AMD’s RDNA 2 architectures, right here’s a quick technical explainer on the innards of Intel Arc’s Xe HPG chips. Much the way in which Nvidia and AMD use totally different applied sciences and terminologies for his or her designs, Intel’s Arc chips depend on some proprietary ideas (together with a brand new tackle clock speeds that wants some explaining). That makes it tough to match Arc in opposition to rival GPU architectures—Intel doesn’t even use frequent phrases like ROPs and TMUs—however by the point we’re achieved right here, you’ll have a strong high-level understanding of what makes Xe HPG tick. Let’s dig in.
Meet Xe HPG
Intel
For Intel, Xe HPG “render slices” comprise the spine of each Arc GPU. Intel’s laptop computer and desktop Arc choices may be scaled up or down as wanted to suit totally different market wants, however these render slices are at their coronary heart, containing devoted ray tracing models, rasterizers, geometry blocks, and the elemental constructing block for Arc, the Xe Cores themselves. Xe XPG can scale all the way in which as much as eight render slices within the flagship Arc A770.
Each render slice accommodates 4 Xe cores and 4 ray tracing models, together with all the opposite bits vital for working a contemporary GPU. These render slices are fully DirectX 12 Ultimate compliant, that means Intel’s Arc GPUs can deal with ray tracing, Variable Rate Shading, Mesh Shading, and all the opposite options related to that commonplace.

Intel
Let’s go deeper and take a peek on the Xe cores themselves. Each Xe core (once more, there are 4 per render slice) is comprised of three key bits: 16 256-bit “XVE” vector engines that deal with extra conventional rasterization duties, 16 1024-bit “XMX” matrix engines that deal with machine studying duties (just like the tensor cores in Nvidia’s rival RTX GPUs), and 192KB of shared L1/SLM cache. That cache can be utilized to carry duties throughout compute workloads, or shaders and textures whereas gaming.

Intel
The greatest firms in PC gaming could also be betting massive on ray tracing being the way forward for graphics—every Xe Core features a specialised Thread Sorting Unit designed to assist shaders course of willy-nilly bouncing ray tracing information extra effectively, for instance—however conventional rendering stays king for now. Each Xe Vector Engine features a devoted floating level (FP) execution port to deal with conventional shading duties, together with a shared INT/EM port that may sort out integer-based duties on the identical time.
Nvidia launched concurrent FP/INT pipelines with its RTX 20-series “Turing” architecture to maintain integer duties from clogging up the FP32 pipeline, and it’s grow to be the norm since. “When Nvidia examined how real-world games behaved, it found that for every 100 floating point instructions performed, an average of 36 and as many as 50 non-floating point instructions were also processed, jamming things up,” we wrote in 2018. “The new integer pipeline handles these additional directions individually from and concurrently with the FP32 pipeline. Executing the 2 duties on the identical time leads to an enormous velocity enhance.”

Intel
Intel’s devoted “XMX” matrix engines hook into the vector engines in every Xe Core. They’re broadly much like Nvidia’s RTX tensor cores, designed to enormously speed up machine studying duties. These are the bits that unlock the potential of XeSS, Intel’s rival to Nvidia’s vaunted DLSS upsampling, in addition to different particular sauce options like Hyper Compute and the digital digital camera function in Intel’s new Arc Control command middle. (Again, learn our Arc laptop GPU coverage for deeper perception into these consumer-level options.)

Intel
When tapped by suitable software program (similar to a recreation with XeSS or an app that helps Hyper Compute), the XMX core’s 4-deep systolic array can calculate as much as 256 multiply accumulate (MAC) operations per clock for INT8 inferencing, a large enhance over the 64 ops/clock provided by trendy GPUs with DP4a {hardware} on board, and the 16 ops/clock supported by older GPUs.
Intel’s XeSS helps a fallback mode to run on rival Nvidia and AMD graphics playing cards that lack XMX cores, defaulting to DP4a {hardware} as an alternative. This image illustrates very effectively why Intel says XeSS runs a lot, a lot sooner on Arc GPUs with XMX {hardware} inside.

Intel
Each Xe Core options 16 complete Vector and Matrix engines, with pairs of every working in lockstep, capable of run FP, INT, and XMX duties all on the identical time. Arc GPUs may be stored very, very busy certainly. The full extent of that busyness, and a deeper dive into how Xe HPG handles advanced ray tracing duties, may be discovered within the Intel explainer video under.
Intel has at all times been pleased with its media engines, spearheaded by the lightning-fast QuickSync expertise, and the Xe XPG’s media engine is not any totally different. It contains all the trendy capabilities you’d anticipate in a graphics chip—numerous 8K HDR encode and decode assist, HEVC, VP9, you title it—but additionally one massive inclusion that no different chip (CPU or GPU) provided when Arc was introduced: hardware-accelerated AV1 encoding. (Nvidia’s GeForce RTX 40-series will even assist AV1 encoding, nevertheless.

Intel
The extremely environment friendly next-generation video commonplace was created by a consortium of business giants and is quickly transferring in direction of changing into the norm, and trendy desktop GPUs assist AV1 decoding that may allow you to watch 8K movies with out your system setting itself on hearth, however till now you wanted to make use of software program alone to really create AV1 movies.
Intel says that the hardware-accelerated AV1 creation unlocked by Arc is 50 occasions sooner than software program encodes, or it’s able to delivering a lot clearer streaming visuals on the identical bitrate as different encoders. We’ve tested Arc’s AV1 chops and found it certainly places Nvidia and AMD’s conventional encoders to disgrace. (Yes, even NVENC.)
Paired with the Hyper Encode function provided in all-Intel laptops and desktops as a part of the corporate’s Deep Link suite, which leverages the media engines in each the CPU and GPU quite than one or the opposite, Arc-based methods may show terribly compelling for video creators.
Xe HPG show engine

Intel
The Xe HPG show engine stays constant throughout the Arc GPU stack, that means each Arc graphics card affords the identical video output capabilities (although the precise port configuration will range by mannequin). Don’t anticipate good body charges should you really attempt gaming on a pair of 8K screens, but it surely’s good to know Arc will assist it if you would like all of the pixels in your productiveness duties!
Meet the Intel Arc A-series GPU lineup

Intel
Let’s take a second to convey all this technical discuss again to the sensible realm. Intel cobbled collectively a bunch of Xe cores and render slices right into a pair of devoted Arc “Alchemist” GPUs: the higher-end ACM-G10, which powers the flagship Arc 7 graphics choices, and the extra modest ACM-G11, which seems in Arc 3 laptops and desktop GPUs.

Intel

Intel
From there, these GPUs may be sliced and diced to satisfy totally different market wants. The charts above present how the primary technology of Arc graphics for laptops shook out.
Xe HPG graphics clock speeds
Something might need jumped out at you in these laptop computer GPU spec charts above: their ultra-low clock speeds. (The desktop GPUs run a lot sooner, and far more usually.) In an period the place Nvidia’s GPUs push 2GHz and a few AMD GPUs clear 2.5GHz, seeing Intel’s Arc cell topping out at 1650MHz and going as little as 900MHz is a tad eye-raising. Clock speeds between rival graphics manufacturers aren’t as clear lower as they appear, nevertheless.

Intel
AMD’s “Game Clock” for Radeon GPUs isn’t the identical as Nvidia’s “Boost Clock,” as I’ve explained before. Intel is utilizing one more metric for its Arc GPUs, dubbed “Graphics Clock.” Petersen outlined Intel’s Graphics Clock as the common clock velocity of typical mild and heavy workloads that exact GPU was supposed for (so gaming for He XPG and certain compute duties for workstation playing cards, for instance). If you take a look at the laptop computer GPU charts above, you’ll additionally see a spread of TDPs outlined for every; the Graphics Clock is predicated off the bottom obtainable TDP. In different phrases, Intel’s Graphics Clock for laptop computer graphics basically represents nearly a worst case situation for Arc GPUs. (Desktop GPUs used a hard and fast energy price range and behave far more usually, after all.)

Intel
All that mentioned, graphics cores can run at totally different speeds relying on how onerous they’re being pushed—they’ll hit a lot larger velocity in 2D retro video games and far decrease speeds in advanced trendy video games that hit each a part of the Xe Core and Render Slice, for instance. And wattage could make a large distinction to efficiency as effectively; as we’ve seen with Nvidia’s cell GeForce choices, pumping extra juice right into a GPU may help propel a lower-tier GPU previous a low-watt model of an ostensibly stronger sibling.
It’s additionally price noting that clock velocity isn’t every part. In the identical firm’s structure, sooner is mostly higher—a 2GHz GeForce GPU will probably be sooner than a 1.5GHz one, say. But AMD’s desktop Radeon RX 6500 XT lags behind its siblings regardless of packing a ludicrously quick 2.8GHz clock velocity. Raw clock velocity positive aspects are removed from the one technique to drive sooner efficiency, as AMD’s Robert Hallock as soon as defined on our Full Nerd podcast. That firm’s Ryzen 7 5800X3D processor actually saw big gaming performance gains by dropping clock speeds and plopping an enormous slab of cache atop the chip.
It’s difficult, is what I’m saying.
But wait, there’s extra!

Brad Chacos/IDG
And that about does it for our tour of Intel’s Xe HPG structure. If all this speak about matrix engines and media encoders received you sizzling and bothered, you should definitely try our Intel Arc A770 and A750 graphics card review for a deep dive into how all these technical tidbits manifest in actuality.
Arc performs very otherwise than its rivals, for higher and generally for worse, and Xe HPG is the engine driving all of it. Intel’s Arc A750 and A770 Limited Edition hit retailer cabinets on October 12.