Prediction of human activity and detection of subsequent actions is crucial for improving the interaction between humans and robots during collaborative operations. Deep-learning techniques are being applied to recognize human activities, including industrial applications. However, the lack of sufficient dataset in the industrial domain and complexities of some industrial activities such as screw driving, assembling small parts, and others affect the model development and testing of human activities. The InHard dataset (Industrial Human Activity Recognition Dataset) was recently published to facilitate industrial human activity recognition for better human-robot collaboration, which still lacks extended evaluation. We propose an activity recognition method using a combined convolutional neural network (CNN) and long short-term memory (LSTM) techniques to evaluate the InHard dataset and compare it with a new dataset captured in a lab environment. This method improves the success rate of activity recognition by processing temporal and spatial information. Accordingly, the accuracy of the dataset is tested using labeled lists of activities from IMU and video data. A model is trained and tested for nine low-level activity classes with approximately 400 samples per class. The test result shows 88% accuracy for IMU-based skeleton data, 77% for RGB spatial video, and 63% for RGB video-based skeleton. The result has been verified using a previously published region-based activity recognition. The proposed approach can be extended to push the cognition capability of robots in human-centric workplaces.
|