Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models

Apolo, MJ; Mendoza M.

Keywords: Generative models, Creative AI, Cover art generation

Abstract

Deep generative models have caused quite a stir due to their excellent performance in generating original images from different realms of the real world. An example of the application of these models is style transfer, where the style of one object is transferred to the content of another. In this study, an innovative proposal is made for transferring the multimodal style of songs to album covers, which consists of a pipeline structured in three parts. First, it is proposed to train a multimodal latent space from a triplet network model that receives a dataset of cover images and songs represented as spectrograms, around 18 genres. Then, with this latent space, the knn algorithm is computed, and the closest cover art to a query song is obtained. Finally, fine-tuning is performed on a pretrained Spectral Normalized GAN model on ImageNet, training only the batch parameters to avoid overfitting. And later, the original cover art is sampled. This way, the pipeline is executed for songs of 10 different genres, obtaining covers of similar genres in the 100 closest neighbors and obtaining images with an average Frechet Inception Distance of 20.89.

Más información

Título según WOS: Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models
Título según SCOPUS: Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models
Título de la Revista: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen: 14035
Editorial: Springer Science and Business Media Deutschland GmbH
Fecha de publicación: 2023
Página de inicio: 229
Página final: 240
Idioma: English
DOI:

10.1007/978-3-031-34732-0_17

Notas: ISI, SCOPUS