SKELETON-BASED HUMAN ACTION RECOGNITION USING TRANSFORMER MODEL WITH SOFTMAX WITH MULTIDIMENSIONAL CONNECTED WEIGHTS

Avazjon Marakhimov; Kabul Khudaybergenov

Авторы

Avazjon Marakhimov Автор
Kabul Khudaybergenov Автор

Ключевые слова:

SoftMax, machine learning, action classification, skeleton motion, human action recognition, convolution, deep learning.

Аннотация

Skeleton-based human action recognition (HAR), particularly from CCTV surveillance footage, has garnered significant interest within the artificial intelligence community. The skeletal modality provides a robust, high-level representation of human motion. Prevailing methods in this domain predominantly rely on a joint-centric approach, modeling the human body as a set of coordinate points. However, this representation often fails to fully capture the rich structural and kinematic relationships essential for accurate motion classification. To address this limitation, we propose a novel method termed SoftMax with Multi-Dimensional Connected Weights. This approach enhances classification by explicitly modeling the informative connections between body joints, represented as skeletal edges. We develop an end-to-end deep learning framework that learns discriminative spatio-temporal representations directly from sequences of skeleton point vectors using Convolutional Neural Networks (CNNs). Results demonstrate that our approach achieves stateof-the-art performance, underscoring the effectiveness of leveraging skeletal edge information and advanced classification techniques for human action recognition.

Скачивания

Данные по скачиваниям пока не доступны.

Библиографические ссылки

[1] T.V. Nguyen, B. Mirza, Dual-layer kernel extreme learning machine for action recognition, Neurocomputing 260 (2017) 123–130.

[2] R. Minhas, A. Baradarani, S. Seifzadeh, Q.J. Wu, Human action recognition using extreme learning machine based on visual vocabularies, Neurocomputing 73 (10–12) (2010) 1906–1917.

[3] D. Zhao, L. Shao, X. Zhen, Y. Liu, Combining appearance and structural features for human action recognition, Neurocomputing 113 (2013) 88–96.

[4] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.

[5] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia 19 (2) (2012) 4–10.

[6] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291–7299.

[7] C. Cao, Y. Zhang, C. Zhang, H. Lu, Body joint guided 3-d deep convolutional descriptors for action recognition, IEEE Transactions on Cybernetics 48 (3) (2017) 1095–1108.

[8] J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, European Conference on Computer Vision (2016) 816–833.

[9] Y. Hou, Z. Li, P. Wang, W. Li, Skeleton optical spectra-based action recognition using convolutional neural networks, IEEE Transactions on Circuits and Systems for Video Technology 28 (3) (2016) 807–811.

[10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, Actional-structural graph convolutional networks for skeleton-based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3595–3603.

[11] C. Li, Y. Hou, P. Wang, W. Li, Joint distance maps based action recognition with convolutional neural networks, IEEE Signal Processing Letters 24 (5) (2017) 624–628.