Previous Chapter： improve 5G application support capability in 2 years Next Chapter：ST is considering the acquisition of Nordic

Understanding How to Choose the Right Processor for Your Workload，GPUs and CPUs?

Post Date：2021-07-16，NVE Corp/Isolation Products

nTwitterEnterprises that are developing systems to run intense AI workloads have two choices to do the heavy lifting: traditional CPU (central processing unit) architecture, or a specialized GPU (graphics processing unit) pipeline. It’s almost universally understood that in the modern enterprise context, running numerous deep learning workloads in parallel is the bread and butter of GPUs. However, it’s important to clearly understand the full extent of AI/deep learning applications and how to choose the right type of processor for a given workload.

CPU and GPU: How They Work

CPUs are designed to prioritize operational speed for sequential calculations, which allows them to perform diverse instruction sets. When assigned the right task, CPUs perform with outstanding quickness, measured by the clock speed. Today, the CPU is still the core part of any computing device. It handles basic instructions and allocates the more complicated tasks to other specific chips on the motherboard. Note that a GPU is not a substitute for a CPU.

A GPU is designed to quickly render high-resolution 2D/3D images, video, visualization and display. GPUs began as graphics pipelines, but their use has evolved into one of the core components of deep learning/AI. GPUs are designed to efficiently handle bulk operations performed in parallel. They are engineered with vastly more numerous computing cores than CPUs, which is an ideal architecture for repetitive, highly-parallel and unique computing tasks.

The main difference between CPU and GPU architecture is that a CPU is designed to quickly handle a wide-range of complex computations measured by CPU clock speed, and GPUs are designed to quickly handle many concurrent simple and specific tasks. CPUs have less cores with high processing speeds. GPUs have thousands of cores and have comparatively slower speeds than CPUs. Because GPUs have more cores and can perform parallel operations on multiple sets of data, they more than catch up to the processing speeds commonly needed for non-graphical tasks, such as machine learning and scientific computation.

When Are GPUs Used?

GPUs are ideal for parallel processing and have become preferred for training AI models: they perfectly match the needs for a process that requires largely identical operations simultaneously performed on all data samples. Data set sizes are growing almost exponentially, and the massive parallelism provided by GPUs results in faster performance of these tasks.

GPUs are designed to excel in applications that require processing of numerous calculations in parallel, which is the overwhelming share of enterprise AI applications:

Accelerated deep learning and AI operations with massive parallel data inputs
Traditional AI training and inference algorithms
Classical neural networks
In short, when raw computational power for processing unstructured or largely identical data is required, GPUs are the preferred solution.

The A100 features NVIDIA’s first 7nm GPU, the GA100. This GPU is equipped with 6912 CUDA cores and 40GB of HBM2 memory. This GPU is also on the first card featuring PCIe 4.0 interface in addition to the SXM4 form factor, and it is just as fast. Because the PCIe version has lower power consumption (250W vs. 400W in the SXM4 counterpart), expect to take a 10% performance hit. However, that is quickly earned back in terms of lower power and cooling expenses. The GA100 graphics processor is a large chip with a die area of 826 mm² and 54.2 billion transistors. It features 6912 shading units, 432 texture mapping units, and 160 ROPs. Also included are 432 tensor cores, which help improve the speed of machine learning applications. NVIDIA has paired 40 GB HBM2E memory with the A100 SXM4, which are connected using a 5120-bit memory interface. The GPU operates at a frequency of 1410 MHz, and memory runs at 1215 MHz. The A100 is optimized for tensor operations, including the new higher precision TF32 and FP64 formats, and lower 8-bit precision computations for inference.

The A100 GPU realizes a 250% advantage over its previous 12nm Volta GPU in peak double precision floating point performance. In HPC workloads, realized speedups ranged between 1.5X and 2.1X compared to its predecessor.

GPUs have evolved significantly compared to 30 years ago, when they were primarily used in personal computers. As performance and densities increased, it has transitioned into professional workstations, then to servers, and now into data center rack pods. As more applications run in the cloud and datacenters, expect GPUs to be an essential element of architecture and systems. Growing GPU power is demonstrated by the Nvidia A100 GPU, which can be sliced into 7 separate instances (multi-instance GPU) so that it can drive improved utilization of GPU processing and provisioning of various workloads in the

Nvidia

Previous Chapter： improve 5G application support capability in 2 years Next Chapter：ST is considering the acquisition of Nordic