A heterogeneous hardware accelerator for image classification in embedded systems
Keywords: Convolutional neural network; Field, programmable gate array; MobileNet V2; Power consumption
Abstract
Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224 Ã 224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.
Más información
| Título según SCOPUS: | A heterogeneous hardware accelerator for image classification in embedded systems |
| Título de la Revista: | Sensors |
| Volumen: | 21 |
| Número: | 8 |
| Editorial: | MDPI AG |
| Fecha de publicación: | 2021 |
| Idioma: | English |
| DOI: |
10.3390/s21082637 |
| Notas: | SCOPUS |