Temporal-Aware Transformer Approach for Violence Activity Recognition
Keywords: training, surveillance, lighting, feature extraction, transformers, transformer, computational modeling, deep learning, Real-time systems, Computer architecture, Streaming media, Bidirectional long short term memory, MobileNetV2, self-attention, violence detection
Abstract
The need for effective violence detection in public spaces has intensified with increasing antisocial behavior and violence. Traditional surveillance systems, which are relying on human operators, face delays and resource challenges. Using advances in artificial intelligence (AI) and computer vision, this research presents a scalable deep learning architecture for real-time violence detection using two approaches. In the first approach, Convolutional Neural Networks (CNN) and bidirectional long-short-term memory (BiLSTM) networks are combined, where MobileNetV2 is used for spatial feature extraction and BiLSTM for temporal pattern recognition, achieving an accuracy of 95.6%. The second approach incorporates a spatial-temporal transformer (TransformerSeq) in place of BiLSTM, improving performance to 97.2% by capturing spatiotemporal relationships in video data more effectively through self-attention for temporal feature learning. The lightweight SOTA MobileNetV2, along with the proposed MobileTransformerSeq, enables the effective differentiation between violent and non-violent activities, demonstrating the potential to enhance public safety in diverse settings.
Más información
| Título según WOS: | Temporal-Aware Transformer Approach for Violence Activity Recognition |
| Título de la Revista: | IEEE ACCESS |
| Volumen: | 13 |
| Editorial: | IEEE |
| Fecha de publicación: | 2025 |
| Página de inicio: | 70779 |
| Página final: | 70790 |
| Idioma: | English |
| DOI: |
10.1109/ACCESS.2025.3560828 |
| Notas: | ISI |