چكيده به لاتين
To communicate with people, robots and vision-based interactive systems often need to understand human activities in advance before the activity is performed completely. However, predicting activities in advance is a very challenging task, because some activities are simple while others are complex and comprised of several smaller atomic sub-activities. In this thesis, we propose a method capable of recognizing and early prediction of simple and complex human activities by formulating it as a structured prediction task using probabilistic graphical models (PGM). We use skeletons captured from low-cost depth sensors as high-level descriptions of the human body. Using 3D skeletons, our method will be robust to the environmental factors (i.e., illumination, complex background, human body shapes, view-point, etc.). In addition, there are different types of activities that systems need to interpret for seamless interaction with humans. We recognize the activities within the context of graphical models in a sequence-labeling framework.We propose a new structured prediction strategy based on probabilistic graphical models (PGMs) to recognize both types of activities (i.e., complex and simple). These activity types are often spanned in very diverse subspaces in the space of all possible activities, which would require different model parameterizations. In order to deal with these parameterization and structural breaks across models, a category-switching scheme is proposed to switch over the models based on the activity types. For parameter optimization, we utilize a distributed structured prediction technique to implement our model in a distributed setting. Our proposed model utilizes a fully observed PGM coupled with a clustering scheme for initialization. Using a fully observed model for initialization, the learning speed increased but the accuracy does not change. Also, our method is sensitive to clustering methods that are used to determine the middle states, we evaluate test different clustering, methods. We test our method on three popular datasets: CAD-60, UT-Kinect, and Florence 3D and obtain recognition accuracies of 97.6%, 100%, and 96.11%, respectively. These datasets cover both simple and complex activities. When only half of the clip is observed, we achieve 93.33% and 96.9% prediction accuracy on CAD-60 and UT-Kinect datasets, respectively.