FPGA vs. GPU vs. CPU – hardware options for AI applications
Field programmable gate arrays (FPGAs) deliver many advantages to artificial intelligence (AI) applications. How do graphics processing units (GPUs) and traditional central processing units (CPUs) compare?
The term artificial intelligence (AI) refers to non-human machine intelligence capable of making decisions in a manner similar to human beings. This includes the faculties for judgement, contemplation, adaptability and intention.
Research firm Statista forecasts the global market for AI to reach $126 billion USD by 2025. By 2030, AI will make up an estimated 26.1% of China’s GDP, 14.5% of North America’s GDP and 13.6% of the United Arab Emirates’ GDP.
The overall AI market includes a wide array of applications, including natural language processing (NLP), robotic process automation, machine learning, and machine vision. AI is quickly gaining adoption in many industry verticals and is creating the next great technological shift, much like the advent of the personal computer and smartphone.
The beginnings of artificial intelligence (AI) and its terminology can be credited to the Logic Theorist program created by researchers Allen Newell, Cliff Shaw and Herbert Simon in 1956. The Logic Theorist program was designed to emulate the problem-solving skills of human beings and was funded by the Research and Development (RAND) Corporation. Logic Theorist is considered to be the first AI program and was presented at the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI), Dartmouth College, New Hampshire, in 1956.
While AI relies primarily on programming algorithms that emulate human thinking, hardware is an equally important part of the equation. The three main hardware solutions for AI operations are field programmable gate arrays (FPGAs), graphics processing units (GPUs) and central processing units (CPUs).
Each option delivers its own strengths and has some limitations that we’ll explore further.
Field programmable gate arrays (FPGAs) are types of integrated circuits with programmable hardware fabric. This differs from graphics processing units (GPUs) and central processing units (CPUs) in that the function circuitry inside an FPGA processor is not hard etched. This enables an FPGA processor to be programmed and updated as needed. This also gives designers the ability to build a neural network from scratch and structure the FPGA to best meet their needs.
The reprogrammable, reconfigurable architecture of FPGAs delivers key benefits to the ever-changing AI landscape, allowing designers to quickly test new and updated algorithms quickly. This delivers strong competitive advantages in speeding time to market and cost savings by not requiring the development and release of new hardware.
FPGAs deliver a combination of speed, programmability and flexibility that translates into performance efficiencies by reducing the cost and complexities inherent in the development of application-specific integrated circuits (ASICs).
Key advantages FPGAs deliver include:
- Excellent performance with reduced latency advantages: FPGAs provide low latency as well as deterministic latency (DL). DL as a model will continuously produce the same output from an initial state or given starting condition. The DL provides a known response time which is critical for many applications with hard deadlines. This enables faster execution of real-time applications like speech recognition, video streaming and motion recognition.
- Cost effectiveness: FPGAs can be reprogrammed after manufacturing for different data types and capabilities, delivering real value over having to replace the application with new hardware. By integrating additional capabilities — like an image processing pipeline — onto the same chip, designers can reduce costs and save board space by using the FPGA for more than just AI. The long product lifecycle of FPGAs can deliver increased utility for an application that can be measured in years or even decades. This characteristic makes them ideal for use in industrial, aerospace, defense, medical and transportation markets.
- Energy efficiency: FPGAs give designers the ability to fine-tune the hardware to the match application needs. Utilizing development tools like INT8 quantization is a successful method for optimizing machine learning frameworks like TensorFlow and PyTorch. INT8 quantization also delivers favorable results for hardware toolchains like NVIDIA® TensorRT and Xilinx® DNNDK. This is because INT8 uses 8-bit integers instead of floating-point numbers and integer math instead of floating-point math. Proper utilization of INT8 can reduce both memory and computing requirements, which can shrink memory and bandwidth usage by as much as 75%. This can prove critical in meeting power efficiency requirements in demanding applications.
FPGAs can host multiple functions in parallel and can even assign parts of the chip for specific functions which greatly enhances operational and energy efficiency. The unique architecture of FPGAs places small amounts of distributed memory into the fabric, bringing it closer to the processing. This reduces latency and, more importantly, can reduce power consumption compared to a GPU design.
Graphic processing units (GPUs) were originally developed for use in generating computer graphics, virtual reality training environments, and video that rely on advanced computations and floating point capabilities for drawing geometric objects, lighting and color depth. In order for artificial intelligence to be successful, it needs a lot of data to analyze and learn from. This requires substantial computing power to execute the AI algorithms and shift large amounts of data. GPUs can perform these operations because they are specifically designed to quickly process large amounts of data used in rendering video and graphics. Their strong computational abilities have helped to make them popular in machine learning and artificial intelligence applications.
GPUs are good for parallel processing, which is the computation of very large numbers of arithmetic operations in parallel. This delivers respectable acceleration in applications with repetitive workloads that are performed repeatedly in rapid succession. Pricing on GPUs can come in under competitive solutions, with the average graphics card having a five-year lifecycle.
AI on GPUs does have its limitations. GPUs don’t generally deliver as much performance as ASIC designs where the microchip is specifically designed for an AI application. GPUs deliver a lot of computational power at the expense of energy efficiency and heat. Heat can create durability issues for the application, impair performance, and limit types of operational environments. The ability to update AI algorithms and add new capabilities is also not comparable to FPGA processors.
The central processing unit (CPU) is the standard processor used in many devices. Compared to FPGAs and GPUs, the architecture of CPUs has a limited number of cores optimized for sequential serial processing. Arm® processors can be an exception to this because of their robust implementation of Single Instruction Multiple Data (SIMD) architecture, which allows for simultaneous operation on multiple data points, but their performance is still not comparable to GPUs or FPGAs.
The limited number of cores diminishes the effectiveness of a CPU processor to process the large amounts of data in parallel needed to properly run an AI algorithm. The architecture of FPGAs and GPUs is designed with the intensive parallel processing capabilities required for handling multiple tasks quickly and simultaneously. FPGA and GPU processors can execute an AI algorithm much more quickly than a CPU. This means that an AI application or neural network will learn and react several times faster on a FPGA or GPU compared to a CPU.
CPUs do offer some initial pricing advantages. When training small neural networks with a limited dataset, a CPU can be used, but the trade-off will be time. The CPU-based system will run much more slowly than an FPGA or GPU-based system. Another benefit of the CPU-based application will be power consumption. Compared to a GPU configuration, the CPU will deliver better energy efficiency.
Tiny machine learning (TinyML)
Seen as the next evolutionary phase of AI development, TinyML is experiencing strong growth. AI applications operating on FPGA, GPU and CPU processors are very powerful but they can’t be used in all contexts like cellphones, drones and wearable applications.
With the widespread adoption of connected devices, the need exists for local data analysis that reduces dependency on the cloud for complete functionality. TinyML enables low-latency, low power and low bandwidth inference models on edge devices operating on microcontrollers.
The average consumer CPU will draw between 65 to 85 watts of power, while the average GPU consumes anywhere between 200 to 500 watts. In comparison, a typical microcontroller draws power in the order of milliwatts or microwatts, which is a thousand times less power consumption. This energy efficiency enables the TinyML devices to run on battery power for weeks, months and even years, while running ML applications on the edge.
TinyML with its support for frameworks that include TensorFlow Lite, uTensor, and Arm’s CMSIS-NN, brings together AI and small connected devices.
Benefits of TinyML include:
- Energy efficiency: Microcontrollers consume very little power, which delivers benefits in remote installations and mobile devices.
- Low latency: By processing data locally at the edge, data doesn't need to be transmitted to the cloud for inference. This greatly reduces device latency.
- Privacy: Data can be stored locally, not on cloud servers.
- Reduced bandwidth: With decreased dependency on the cloud for inference, bandwidth concerns are minimized.
The future of TinyML using MCUs is promising for small edge devices and modest applications where an FPGA, GPU or CPU are not viable options.
The three main hardware choices for AI are: FPGAs, GPUs and CPUs. In AI applications where speed and reaction times are critical, FPGAs and GPUs deliver benefits in learning and reaction time. While GPUs have the capability to process large amounts of data needed for AI and neural networks, drawbacks include energy efficiency, thermal considerations (heat), durability and the ability to update the application with new capabilities and AI algorithms. FPGAs deliver key advantages in AI applications and neural networks. These include energy efficiency, utility, durability and the ability to easily update the AI algorithm.
Significant progress has also been made in development software for FPGAs that makes them easier to program and compile. Exploring your hardware options is critical for the success of your AI application. Study your options carefully before making a final decision.
Choosing the right technology partners for your next innovation optimizes efficiency, mitigates potential risks and maximizes profit potential. To achieve your goals, Avnet can connect you with our trusted global technology partners in artificial intelligence. This enables you to focus valuable resources on intellectual property innovation and other areas that deliver a strong competitive edge. Together, we provide the support needed to successfully differentiate your product offerings, accelerate your time to market and improve business outcomes.
With a century of innovation at our foundation, Avnet can guide you through the challenges of developing and delivering new technology at any — or every — stage of your journey. We have the expertise to support your innovation, turn your challenges into opportunities and build the right solutions for your success. Make your vision a reality and reach further with Avnet as your trusted global technology partner.