Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition
Abstract
Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.
Más información
Título según WOS: | ID WOS:000549558800001 Not found in local WOS DB |
Título de la Revista: | APPLIED SCIENCES-BASEL |
Volumen: | 10 |
Número: | 12 |
Editorial: | MDPI |
Fecha de publicación: | 2020 |
DOI: |
10.3390/app10124412 |
Notas: | ISI |