An End-to-End Workflow to Efficiently Compress and Deploy DNN Classifiers on SoC/FPGA

Molina, Romina Soledad; Morales, Ivan Rene; Crespo, Maria Liz; Costa, Veronica Gil; Carrato, Sergio; Ramponi, Giovanni

Abstract

Machine learning (ML) models have demonstrated discriminative and representative learning capabilities over a wide range of applications, even at the cost of high-computational complexity. Due to their parallel processing capabilities, reconfigurability, and low-power consumption, systems on chip based on a field programmable gate array (SoC/FPGA) have been used to face this challenge. Nevertheless, SoC/FPGA devices are resource-constrained, which implies the need for optimal use of technology for the computation and storage operations involved in ML-based inference. Consequently, mapping a deep neural network (DNN) architecture to a SoC/FPGA requires compression strategies to obtain a hardware design with a good compromise between effectiveness, memory footprint, and inference time. This letter presents an efficient end-to-end workflow for deploying DNNs on an SoC/FPGA by integrating hyperparameter tuning through Bayesian optimization (BO) with an ensemble of compression techniques.

Más información

Título según WOS: ID WOS:001302493400014 Not found in local WOS DB
Título de la Revista: IEEE EMBEDDED SYSTEMS LETTERS
Volumen: 16
Número: 3
Editorial: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Fecha de publicación: 2024
Página de inicio: 255
Página final: 258
DOI:

10.1109/LES.2023.3343030

Notas: ISI