SKELETON-BASED HUMAN ACTION RECOGNITION USING TRANSFORMER MODEL WITH SOFTMAX WITH MULTIDIMENSIONAL CONNECTED WEIGHTS
Ключевые слова:
SoftMax, machine learning, action classification, skeleton motion, human action recognition, convolution, deep learning.Аннотация
Skeleton-based human action recognition (HAR), particularly from CCTV surveillance footage, has garnered significant interest within the artificial intelligence community. The skeletal modality provides a robust, high-level representation of human motion. Prevailing methods in this domain predominantly rely on a joint-centric approach, modeling the human body as a set of coordinate points. However, this representation often fails to fully capture the rich structural and kinematic relationships essential for accurate motion classification. To address this limitation, we propose a novel method termed SoftMax with Multi-Dimensional Connected Weights. This approach enhances classification by explicitly modeling the informative connections between body joints, represented as skeletal edges. We develop an end-to-end deep learning framework that learns discriminative spatio-temporal representations directly from sequences of skeleton point vectors using Convolutional Neural Networks (CNNs). Results demonstrate that our approach achieves stateof-the-art performance, underscoring the effectiveness of leveraging skeletal edge information and advanced classification techniques for human action recognition.
Скачивания
Библиографические ссылки
[1] T.V. Nguyen, B. Mirza, Dual-layer kernel extreme learning machine for action recognition, Neurocomputing 260 (2017) 123–130.
[2] R. Minhas, A. Baradarani, S. Seifzadeh, Q.J. Wu, Human action recognition using extreme learning machine based on visual vocabularies, Neurocomputing 73 (10–12) (2010) 1906–1917.
[3] D. Zhao, L. Shao, X. Zhen, Y. Liu, Combining appearance and structural features for human action recognition, Neurocomputing 113 (2013) 88–96.
[4] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.
[5] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia 19 (2) (2012) 4–10.
[6] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291–7299.
[7] C. Cao, Y. Zhang, C. Zhang, H. Lu, Body joint guided 3-d deep convolutional descriptors for action recognition, IEEE Transactions on Cybernetics 48 (3) (2017) 1095–1108.
[8] J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, European Conference on Computer Vision (2016) 816–833.
[9] Y. Hou, Z. Li, P. Wang, W. Li, Skeleton optical spectra-based action recognition using convolutional neural networks, IEEE Transactions on Circuits and Systems for Video Technology 28 (3) (2016) 807–811.
[10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, Actional-structural graph convolutional networks for skeleton-based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3595–3603.
[11] C. Li, Y. Hou, P. Wang, W. Li, Joint distance maps based action recognition with convolutional neural networks, IEEE Signal Processing Letters 24 (5) (2017) 624–628.
Загрузки
Опубликован
Выпуск
Раздел
Лицензия

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.
License Terms of our Journal