An End-to-End Workflow to Efficiently Compress and Deploy DNN Classifiers on SoC/FPGA
Abstract
Machine learning (ML) models have demonstrated discriminative and representative learning capabilities over a wide range of applications, even at the cost of high-computational complexity. Due to their parallel processing capabilities, reconfigurability, and low-power consumption, systems on chip based on a field programmable gate array (SoC/FPGA) have been used to face this challenge. Nevertheless, SoC/FPGA devices are resource-constrained, which implies the need for optimal use of technology for the computation and storage operations involved in ML-based inference. Consequently, mapping a deep neural network (DNN) architecture to a SoC/FPGA requires compression strategies to obtain a hardware design with a good compromise between effectiveness, memory footprint, and inference time. This letter presents an efficient end-to-end workflow for deploying DNNs on an SoC/FPGA by integrating hyperparameter tuning through Bayesian optimization (BO) with an ensemble of compression techniques.
Más información
Título según WOS: | ID WOS:001302493400014 Not found in local WOS DB |
Título de la Revista: | IEEE EMBEDDED SYSTEMS LETTERS |
Volumen: | 16 |
Número: | 3 |
Editorial: | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
Fecha de publicación: | 2024 |
Página de inicio: | 255 |
Página final: | 258 |
DOI: |
10.1109/LES.2023.3343030 |
Notas: | ISI |