The Intel Arc graphics products are a new entrant to the discrete GPU landscape, and a lot of shader programs assumed certain characteristics about the underlying GPU architecture that simply don’t match well with the Xe HPG architecture. Shaders are small programs that games execute on the GPU and are written in a specialized programming language (for example HLSL) that Intel’s compilers convert into GPU code. This includes identifying specific application processes and then passing flags to our compiler for efficient memory utilization, optimizations of what type of SIMD instruction is preferred, and even wholesale shader replacements to ensure optimal hardware performance. We are improving our process over time and have developed a collection of techniques that don’t require game updates when the title is no longer in development, but instead can be done solely in the driver. Most games will run well, but we’ve found a few compatibility issues on our Intel Arc products that show up in applications developed before the products existed. Question #2: How is Intel improving performance and compatibility for games?Īs we start the journey for Intel® Arc™, we have been optimizing our software stack for a wide range of real-world workloads, especially top games and content creation workloads, to give the best possible consumer experience. This type of computational structure is sometimes called a systolic array, which can help accelerate AI applications for gamers and creators. With 4 stages this yields 256 Ops per clock – a 16X increase over the traditional 32-bit SIMD MAC. Like DP4a, each operand is sliced into 4 chunks which are multiplied and accumulated independently, 64 Ops per stage, shown again by the purple tiles. The matrix engine accelerates further by pipelining the multiply accumulate 4-deep. This is followed by 32 additions for the accumulation or a total of 64 Ops per cycle - a 4x improvement over the standard SIMD MAC. This is a total of 32 parallel multiplications (shown by the purple squares in the diagram below). It works by dividing all 32-bit inputs into 8-bit chunks and then multiplying the chunks independently. Multiplications followed by 8 parallel additions (16 Ops per clock total).ĭP4a is an optimization targeting AI workloads when 32-bit floating point precision is not required. The Xe HPG Vector Engine does 8 parallel elementwise The MAC instruction (multiply accumulate) is the basic SIMD vector instruction used in graphics and is at the heart of our vector engine. To really understand the matrix engine, it helps to understand how data flows through each of our engines. Question #1: How does Intel’s new Matrix Engine compare to your Vector Engine in Xe HPG? Just like last time, I want to address three questions I’ve been asked most recently and would be interesting for our community: By Lisa Pearce, Vice President and General Manager for the Visual Compute GroupĮxcited that launch day is here! Hope everyone is tuning in for our launch event today to check out our overview of the Intel® Arc™ graphics mobile product capabilities and key demos.
0 Comments
Leave a Reply. |