Enhancing the value of future Intel hardware with new Intel oneAPI 2023 tools
The 2023 release of the Intel oneAPI tools, which are already rolling out through conventional distribution channels and are available in the Intel® Developer Cloud, was announced by Intel today. The future 4th Gen Intel® Xeon® Scalable CPUs, Intel® Xeon® CPU Max Series, and Intel® Data Center GPUs, including Flex Series and the new Max Series, are all supported by the new oneAPI 2023 tools. The tools improve productivity and speed while also adding support for brand-new Codeplay1 plug-ins that make it simpler than ever for programmers to build SYCL code for GPU architectures other than Intel. These standards-based tools make it simple to create high-performance apps that work on multiarchitecture systems and offer a variety of hardware options.
“On our development systems with Intel Max Series GPU accelerators, we are observing positive preliminary application performance results for programmes created utilising Intel’s oneAPI compilers and libraries. We value the advantages of code portability from multivendor, multiarchitecture programming standards like SYCL and Python AI frameworks like PyTorch, which are accelerated by Intel libraries, for leadership-class computational science. Next year, we anticipate the first exascale scientific breakthroughs from these technologies on the Aurora system.”
OneAPI Tools’ Deliverables
To create high-performance, multiarchitecture applications for CPUs, GPUs, and FPGAs that are powered by oneAPI, Intel’s 2023 developer tools include a comprehensive set of the newest compilers and libraries, analysis and porting tools, and optimised artificial intelligence (AI) and machine learning frameworks. Using a single codebase and the tools, developers can easily fulfil performance goals and save time, freeing up more time for creativity.
This new oneAPI tools release enables programmers to benefit from the cutting-edge features of Intel hardware:
- Intel’s 4th generation Xeon Scalable and Xeon CPU Max Series processors feature bfloat16, Intel® AVX-512, Intel® Quick Assist Technology (Intel® QAT), and Intel® AMX.
- Flex Series GPUs with a hardware-based AV1 encoder and Max Series GPUs with data type flexibility, Intel® Xe Matrix Extensions (Intel® XMX), a vector engine, Intel® Xe Link, and other capabilities are examples of Intel® Data Center GPUs.
Typical benchmarks
- Intel® AMX, made possible by the Intel® oneAPI Deep Neural Network Library, demonstrated a 3.6x performance boost over Nvidia at 2.4 and AMD as the baseline in MLPerfTM DeepCAM deep learning inference and training performance (oneDNN).
- Performance improvements over 3rd Gen Intel Xeon or AMD Milan alone of up to 16x were achieved by LAMMPS (large-scale atomic/molecular massively parallel simulator) workloads running on Xeon Max CPU with kernels offloaded to six Max Series GPUs and improved by oneAPI tools.
performance of advanced software
- The expansion of OpenMP GPU offload capability and full Fortran language standard support through Fortran 2018 provided by the Intel® Fortran Compiler has sped up the creation of programmes that adhere to the standards.
- The increased OpenMP offload capabilities of the Intel® oneAPI Math Kernel Library (oneMKL) enhances portability.
- The 4th Gen Intel Xeon and Max Series CPU processors’ sophisticated deep learning features, including Intel® AMX, Intel AVX-512, VNNI, and bfloat16, are enabled via Intel® oneAPI Deep Neural Network Library (oneDNN).
Richer SYCL support, strong code migration, and analysis tools, and increased developer productivity make it simpler to write code for multiarchitecture systems.
- To make it easier to write SYCL code and increase code portability across these processor architectures, the Intel® oneAPI DPC++/C++ Compiler now provides support for new plug-ins for AMD and Nvidia GPUs from Codeplay Software. For cross-platform productivity, this offers an uniform build environment with integrated tools. Starting with the oneAPI plug-in for Nvidia GPUs, Intel and Codeplay will provide commercial priority support as part of this solution.
- With the addition of more than 100 CUDA APIs to the Intel® DPC++ Compatibility Tool, which is based on the free source SYCLomatic, CUDA-to-SYCL code migration is now made simpler.
- With the Intel® VTuneTM Profiler, users can locate MPI imbalances at a large scale.
- For Intel Data Center GPU Max Series, Intel® Advisor now includes automated roofline analysis to help identify and prioritise memory, cache, or compute bottlenecks and their root causes, as well as provide actionable insights for reducing CPU-to-GPU offloading costs associated with data-transfer reuse.
What’s Important:
In order to handle the expanding scope and size of real-world workloads, more effective multiarchitecture programming is needed, as 48% of developers target heterogeneous systems that use multiple processor types4. With Intel’s standards-based multiarchitecture tools, oneAPI’s open, unified programming model offers freedom of choice in hardware, performance, productivity, and code portability for CPUs and accelerators. Because CUDA-specific code is not portable to other hardware, it creates a siloed development process that confines enterprises to a restricted ecosystem.
Read More: What is the Role of Artificial Intelligence in Healthcare
Adoption of the oneAPI Ecosystem:
OneAPI adoption by the ecosystem is continuing, and new Centers of Excellence are being formed. The University of Cambridge’s Open Zettascale Lab is one, and it focuses on porting important exascale candidate codes, like CASTEP, FEniCS, and AREPO, to oneAPI. Experts teaching oneAPI approaches and tools for compiling and porting code as well as performance optimization are available through the center’s courses and workshops. There are now 30 oneAPI Centers of Excellence in existence.