At Intel Architecture Day, the company unveiled what it is calling its biggest shifts in Intel architectures in a generation.
Raja Koduri, senior vice-president and GM: accelerated computing systems and graphics group, discusses highlights of the announcements.
Efficient-core
A highly scalable x86 microarchitecture for addressing compute requirements across the entire spectrum of our customer’s needs, from low-power mobile applications to many-core microservices.
Compared with Skylake, Intel’s most prolific CPU microarchitecture, the Efficient-core delivers 40% more single-threaded performance at the same power, or the same performance while consuming less than 40% of the power.
For throughput performance, four Efficient-cores deliver 80% more performance while still consuming less power than two Skylake cores running four threads or the same throughput performance while consuming 80% less power.
Performance-core
This x86 core is not only the highest performing CPU core Intel has ever built, but it also delivers a step function in CPU architecture performance that will drive the next decade of compute.
It was designed as a wider, deeper and smarter architecture to expose more parallelism, increase execution parallelism, reduce latency and increase general purpose performance.
It also helps support large data and large code footprint applications. Performance-core provides a Geomean improvement of about 19%, across a wide range of workloads over our current 11th Gen Intel Core architecture (Cypress Cove core) at the same frequency.
Targeted for data center processors and for the evolving trends in machine learning, Performance-core brings dedicated hardware, including Intel’s new Advanced Matrix Extensions (AMX), to perform matrix multiplication operations for an order of magnitude performance – a nearly 8x increase in artificial intelligence acceleration.1 This is architected for software ease of use, leveraging the x86 programing model.
Intel Thread Director
Intel’s unique approach to scheduling was developed to ensure Efficient-cores and Performance-cores work seamlessly together, dynamically and intelligently assigning workloads from the start and optimising the system for maximum real-world performance and efficiency.
With intelligence built directly into the core, Intel Thread Director works seamlessly with the operating system to place the right thread on the right core at the right time.
Alder Lake
Re-inventing the multicore architecture, Alder Lake will be Intel’s first performance hybrid architecture with the new Intel Thread Director.
This is Intel’s most intelligent client system-on-chip (SoC) architecture, featuring a combination of Efficient-cores and Performance-cores, scaling from ultra-mobile to desktop, and leading the industry transition with multiple industry leading I/O and memory. Products based on Alder Lake will begin shipping this year.
Xe HPG and Alchemist SoC
A new discrete graphics microarchitecture is designed to scale to enthusiast-class performance for gaming and creation workloads.
The Xe HPG microarchitecture features a new Xe-core, a compute-focused programmable and scalable element, and full support for DirectX 12 Ultimate.
New matrix engines inside the Xe-cores (referred to as Xe Matrix eXtensions, XMX) accelerate artificial intelligence workloads such as XeSS, a novel upscaling technology that enables high-performance and high-fidelity gaming. Xe HPG-based Alchemist SoCs (formerly code-named DG2) will be coming to market in the first quarter of 2022 under the new Intel Arc brand.
Sapphire Rapids
Combining Intel’s Performance-cores with new accelerator engines, Sapphire Rapids sets the standard for next-generation data center processors. At the heart of Sapphire Rapids is a tiled, modular SoC architecture that delivers significant scalability while still maintaining the benefits of a monolithic CPU interface thanks to Intel’s EMIB multi-die interconnect packaging technology and advanced mesh architecture.
Infrastructure Processing Unit
Mount Evans is Intel’s first dedicated ASIC-based IPU, along with a new FPGA-based IPU reference platform, Oak Springs Canyon. With an Intel IPU-based architecture, cloud service providers (CSPs) can maximise data center revenue by offloading infrastructure tasks from CPUs to IPUs. Offloading infrastructure tasks to the IPU allows CSPs to rent 100% of their server CPUs to customers.
Xe HPC, Ponte Vecchio
The most complex SoC Intel has ever built and a great example of our IDM 2.0 strategy come to life, Ponte Vecchio takes advantage of several advanced semiconductor processes, our revolutionary EMIB technology, and our Foveros 3D packaging.
With this product, we are bringing to life our moon-shot project, the 100 billion-transistor device that delivers industry-leading FLOPs and compute density to accelerate artificial intelligence, high performance computing and advanced analytics workloads.
At Architecture Day, we showed that our early Ponte Vecchio silicon is already demonstrating leadership performance, setting an industry-record in both inference and training throughput on a popular AI benchmark.1 Our A0 silicon is already providing greater than 45 TFLOPS FP32 throughput, greater than 5TBps Memory Fabric bandwidth and greater than 2TBps connectivity bandwidth. Ponte Vecchio, as with our Xe architectures, will be enabled by oneAPI, our open, standards-based, cross-architecture and cross-vendor unified software stack.