Principled deep neural network training through linear programming
Abstract
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident and multiple works in recent years have focused on this task. In this work, using a unified framework, we show that there exists a polyhedron that simultaneously encodes, in its facial structure, all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample size. Notably, the size of the polyhedral representation depends only linearly on the sample size, and a better dependency on several other network parameters is unlikely. Using this general result, we compute the size of the polyhedral encoding for commonly used neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal strong structure arising from these problems. & COPY; 2023 Elsevier B.V. All rights reserved.
Más información
| Título según WOS: | Principled deep neural network training through linear programming |
| Título de la Revista: | DISCRETE OPTIMIZATION |
| Volumen: | 49 |
| Editorial: | AMSTERDAM |
| Fecha de publicación: | 2023 |
| Idioma: | English |
| URL: | https://doi.org/10.1016/j.disopt.2023.100795 |
| DOI: |
10.1016/j.disopt.2023.100795 |
| Notas: | ISI - WOS |