Action Recognition Using a Deep Neural Network for Video Surveillance

Open Access

Issue		EPJ Web Conf. Volume 328, 2025 First International Conference on Engineering and Technology for a Sustainable Future (ICETSF-2025)


Article Number		01020
Number of page(s)		11
DOI		https://doi.org/10.1051/epjconf/202532801020
Published online		18 June 2025

J. Carreira and A. Zisserman, "Quo vadis, action recognition? A new model and the Kinetics dataset," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA (July 2017), pp. 6299–6308. [Google Scholar]
C. Feichtenhofer, H. Fan, J. Malik, and K. He, "SlowFast networks for video recognition," CoRR, vol. abs/1812.03982(2018). http://arxiv.org/abs/1812.03982 [Google Scholar]
R. Girdhar, D. Ramanan, A. Gupta, J. Sivic, and B. Russell, "ActionVLAD: Learning spatio-temporal aggregation for action classification," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA,(July 2017), pp. 971–980. [Google Scholar]
K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Proc. 27th Int. Conf. Neural Inf. Process. Syst. (NIPS), Montreal, Canada (2014), pp. 568–576. [Google Scholar]
D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, "A closer look at spatiotemporal convolutions for action recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA (June 2018), pp. 6450–6459. [Google Scholar]
S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, "Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification," in Proc. Eur. Conf. Comput. Vis. (ECCV) (2018), pp. 305–321. [Google Scholar]
M.A. Khan, K. Javed, S.A. Khan, et al., "Human action recognition using fusion of multiview and deep features: An application to video surveillance," Multimedia Tools and Applications, vol. 79, pp. 1–27(2020). DOI: 10.1007/s11042-020-09023-3. [CrossRef] [Google Scholar]
A. Gilbert, J. Illingworth, and R. Bowden, "Action recognition using mined hierarchical compound features," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 883–897, (May 2011). [Google Scholar]
D. Metaxas and S. Zhang, "A review of motion analysis methods for human nonverbal communication computing," Image Vis. Comput., vol. 31, no. 5, pp. 421–433(2013). [Google Scholar]
Y. Hbali, S. Hbali, L. Ballihi, and M. Sadgal, "Skeleton-based human behavior detection for elderly monitoring systems," IET Comput. Vis., vol. 12, no. 1, pp. 16–26, (2018). [Google Scholar]
I. Theodorakopoulos, D. Kastaniotis, G. Economou, and S. Fotopoulos, "Pose-based human action recognition via sparse representation in dissimilarity space," J. Vis. Commun. Image Represent., vol. 25, no. 1, pp. 12–23(2014). [Google Scholar]
S. Gagli, G.L. Re, and M. Morana, "Human behavior detection process using 3-D posture data," IEEE Trans. Hum.-Mach. Syst., vol. 45, no. 5, pp. 586–597(2015). [Google Scholar]
A. Carlos Cob-Parro et al., "A new framework for deep learning video based human action recognition on the edge," Expert Syst. Appl., vol. 230, p. 122220(2023), DOI: 10.1016/j.eswa.2023.122220. [Google Scholar]
Y. Lu et al., "Video surveillance-based multi-task learning with Swin Transformer for earthwork activity classification," Eng. Appl. Artif. Intell., vol. 131, p. 107814, (2024), DOI: 10.1016/j.engappai.2023.107814. [Google Scholar]
J. Li, R. Han, W. Feng, et al., "Contactless interaction recognition and interactor detection in multi-person scenes," Front. Comput. Sci., vol. 18, p. 185325, (2024), DOI: 10.1007/s11704-023-2418-0. [Google Scholar]
Z. Zhao et al., "STDM-transformer: Space-time dual multi-scale transformer network for skeleton-based action recognition," Neurocomputing, vol. 563, p. 126903, (2024), DOI: 10.1016/j.neucom.2023.126903. [CrossRef] [Google Scholar]
A. Raychaudhuri et al., "SMDF: Spatial mass distribution features and deep learning-based technique for human activity recognition," SN Comput. Sci., vol. 5, p.129, (2024), DOI: 10.1007/s42979-023-02452-2. [Google Scholar]
K. Chaturvedi et al., "Fight detection with spatial and channel wise attention-based ConvLSTM model," Expert Syst., published online, (17Oct.2023),DOI: 10.1111/exsy.13474. [Google Scholar]
V.D. Huszâr, V.K. Adhikarla, I. Négyesi, and C. Krasznay, "Toward fast and accurate violence detection for automated video surveillance applications," IEEE Access, vol. 11, pp. 18772–18793, (2023), DOI: 10.1109/ACCESS.2023.3245521. [CrossRef] [Google Scholar]
K. Bhavani et al., "Human fall detection using Gaussian mixture model and fall motion mixture model," in Proc. 2023 5th Int. Conf. Inventive Res. Comput. Appl. (ICIRCA), Coimbatore, India (2023), pp. 1814–1818, DOI: 10.1109/ICIRCA57980.2023.10220913. [CrossRef] [Google Scholar]
M. Kumar, A. Patel, and M. Biswas, "Real-time detection of abnormal human activity using deep learning and temporal attention mechanism in video surveillance," Multimedia Tools and Applications (2023), DOI: 10.1007/s11042-023-17748-x. [Google Scholar]
W. Ahmed and M. Yousaf, "A deep autoencoder-based approach for suspicious action recognition in surveillance videos," Arab J. Sci. Eng., (2023), DOI: 10.1007/s13369-023-08038-7. [Google Scholar]
R. Gowada, D. Pawar, and B. Barman, "Unethical human action recognition using deep learning based hybrid model for video forensics," Multimedia Tools and Applications, vol. 82, pp. 28713–28738, (2023), DOI: 10.1007/s11042-023-14508-9. [CrossRef] [Google Scholar]
M. Lin, Q. Chen, and S. Yan, "Network in network," CoRR, vol. abs/1312.4400, 2013 http://arxiv.org/abs/1312.4400. [Google Scholar]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "Overfeat: Integrated recognition, localization and detection using convolutional networks," CoRR, vol. abs/1312.6229, (2013). [Online]. Available: http://arxiv.org/abs/1312.6229 [Google Scholar]
M.D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proc. European Conf Computer Vision (ECCV), D.J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., vol. 8689, Lecture Notes in Computer Science, pp. 818–833, Springer, (2014). [Google Scholar]
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," CoRR, vol. abs/1207.0580, (2012). http://arxiv.org/abs/1207.0580 [Google Scholar]
A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, pp. 1106–1114, (2012). [Google Scholar]
X. Yu and L. Xiang, "Classifying cervical spondylosis based on fuzzy calculation," Comput. Math. Methods Med., vol. (2014), Article ID 182956, 2014, DOI: 10.1155/2014/182956. [Google Scholar]
Y. Tian, G. Pang, Y. Chen, R. Singh, J. Verjans, and G. Carneiro, "Weakly-supervised video anomaly detection with robust temporal feature magnitude learning," arXiv preprint arXiv:2101.10030, (2021). https://arxiv.org/abs/2101.10030 [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.