Developing a new automated model to classify combined and basic gestures from complex head motion in real time by using all-vs-all HMM


Dawood, A., Turner, S. J. and Perepa, P. (2017) Developing a new automated model to classify combined and basic gestures from complex head motion in real time by using all-vs-all HMM. Journal of Emerging Technologies and Innovative Research. 4(3), pp. 156-165. 2349-5162
Human head gestures convey a rich message, containing information deliver for peoples as a communication tool. Nodding, shaking are commonly used gestures as non-verbal signals to communicate their intent and emotions. However, the majority of head gestures classification systems focused on head nodding and shaking detection. while they ignored other head gestures which have more expressive emotional signals like rest(up and down), turn, tilt, and tilting. In this paper, we developed a new model to classify all head gestures (rest, turn, tilt, node, shake, and tilting) from complex head motions. The model methodology based on distinguishing basic head movements (rest, turn, and tilt) and combined movements (nodding, shaking, and tilting). The purpose of this system is to detect and label combined and basic head movements in dynamic video. In addition, this phase of this study looking at developing an affective machine uses head movements to extract complex affective states (this work is underway). The system used 3D head rotation angles to classify relevant head gestures in-plan and out-plan of view during user interaction with computer. This system used an open source tracker to detect and track head movements. The Three angels that obtained from the tracker (pitch, yaw, and roll), were analyzed and packed into sequences of observation symbols or cues. Observations formed inputs to an all-vs-all discrete Hidden Markov Model (HMM) classifier. Three classifiers were used for each angle. The classifiers are trained on Boston University dataset, and tested on available mind reading data. The system evaluate on video streams in real time by webcam. The system is fully automatic without incurring any cost of technical methods and doesn’t require any sensitive tools.



References


-->
[1]  F. Althoff, R. Lindl, L. Walchshausl and S. Hoch, "Robust multimodal hand-and head gesture recognition for controlling automotive infotainment systems," VDI BERICHTE, vol. 1919, p. 187, 2005.


[2] E. Murphy-Chutorian and M. M. Trivedi, "Head pose estimation in computer vision: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 4, pp. 607--626, 2009.
https://doi.org/10.1109/TPAMI.2008.106

[3] A. Mignault and A. Chaudhuri, "The many faces of a neutral face: Head tilt and perception of dominance and emotion," Journal of nonverbal behavior, vol. 27, no. 2, pp. 111-132, 2003. [
https://doi.org/10.1023/A:1023914509763

4] V. Blanz and T. Vetter, "A morphable model for the synthesis of 3D faces," Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pp. 187--194, 1999.
https://doi.org/10.1145/311535.311556

[5] R. A. El Kaliouby, Mind-reading machines: automated inference of complex mental states, Cambridge: PhD, thesis, Citeseer, 2005.

[6] J.-G. Wang and E. Sung, "EM enhancement of 3D head pose estimated by point at infinity," Image and Vision Computing, vol. 25, no. 12, pp. 1864--1874, 2007.
https://doi.org/10.1016/j.imavis.2005.12.017

[7] A. Kapoor and R. W. Picard, "A real-time head nod and shake detector," Proceedings of the 2001 workshop on Perceptive user interfaces, pp. 1-5, 2001.
https://doi.org/10.1145/971478.971509

[8] W. Tan and G. Rong, "A real-time head nod and shake detector using HMMs," Expert Systems with Applications, vol. 25, no. 3, pp. 461--466, 2003.
https://doi.org/10.1016/S0957-4174(03)00088-5

[9] J. W. Davis and S. Vaks, "A perceptual user interface for recognizing head gesture acknowledgements," Proceedings of the 2001 workshop on Perceptive user interfaces, pp. 1--7, 2001.
https://doi.org/10.1145/971478.971504

[10] A. Adams, M. Mahmoud, T. Baltrusaitis and P. Robinson, "Decoupling facial expressions and head motions in complex emotions," International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 274--280, 2015.
https://doi.org/10.1109/acii.2015.7344583

[11] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257--286, 1989.
https://doi.org/10.1109/5.18626

[12] R. Bakis, "Continuous speech recognition via centisecond acoustic states," The Journal of the Acoustical Society of America, vol. 59, no. S1, pp. S97--S97, 1976.
https://doi.org/10.1121/1.2003011

[13] R. A. El Kaliouby, "Mind-Reading Machines: automated inference of complex mental states," University of Cambridge, Cambridge, United Kingdom, p. 185, 2005.

[14] T. Baltru, P. Robinson, L.-P. Morency and others, "OpenFace: an open source facial behavior analysis toolkit," 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1--10, 2016.

[15] T. Baltrusaitis and C. T. Baltrusaitis, "Automatic facial expression analysis," University of Cambridge, Computer Laboratory, Technical Report, no. UCAM-CL-TR-861, 2014.

[16] R. P. Gaur and K. N. Jariwala, "A survey on methods and models of eye tracking, head pose and gaze estimation," India, 2014.

[17] C. a. Z. Z. Zhang, A survey of recent advances in face detection, Technical report, Microsoft Research, 2010.

[18] M.-H. Yang, D. J. Kriegman and N. Ahuja, "Detecting faces in images: A survey," IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 1, pp. 34--58, 2002.
https://doi.org/10.1109/34.982883

[19] S. Li, X. Zou, Y. Hu and Z. Zhang, "Real-time multi-view face detection, tracking, pose estimation, alignment, and recognition," IEEE International Conference on Computer Vision and Pattern Recognition, 2001.

[20] M. J. Jones and P. Viola, "Robust real-time object detection," Workshop on statistical and computational theories of vision, p. 56, 2001.

[21] R. Chellappa, C. L. Wilson and S. Sirohey, "Human and machine recognition of faces: A survey," Proceedings of the IEEE, vol. 83, no. 5, pp. 705--741, 1995.
https://doi.org/10.1109/5.381842

[22] M. La Cascia, S. Sclaroff and V. Athitsos, "Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3D models," IEEE Transactions on pattern analysis and machine intelligence, vol. 22, no. 4, pp. 322--336, 2000.
https://doi.org/10.1109/34.845375

[23] J. Xiao, S. Baker, I. Matthews and T. Kanade, "Real-time combined 2D+ 3D active appearance models," CVPR (2), pp. 535--542, 2004.

[24] A. Gee and R. Cipolla, "Determining the gaze of faces in images," Image and Vision Computing, vol. 12, no. 10, pp. 639--647, 1994
https://doi.org/10.1016/0262-8856(94)90039-6

. [25] Y. G. Kang, H. J. Joo and P. K. Rhee, "Real time head nod and shake detection using HMMs," International Conference on KnowledgeBased and Intelligent Information and Engineering Systems, pp. 707-714, 2006.

[26] C. Chris, "Our head movements convey emotions," 2015. [Online]. Available: https://www.mcgill.ca/newsroom/channels/news/ourhead-movements-convey-emotions-256366. [Accessed 17 8 2016].

[27] J. Foytik and V. K. Asari, "A two-layer framework for piecewise linear manifold-based head pose estimation," International journal of computer vision, vol. 101, no. 2, pp. 270--287, 2013.
https://doi.org/10.1007/s11263-012-0567-y

[28] S. Srinivasan and K. L. Boyer, "Head pose estimation using view based eigenspaces," 16th International Conference on Pattern Recognition, pp. 302--305, 2002.
https://doi.org/10.1109/icpr.2002.1047456

[29] E. Seemann, K. Nickel and R. Stiefelhagen, "Head pose estimation using stereo vision for human-robot interaction," Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings, pp. 626-631, May 2004.
https://doi.org/10.1109/afgr.2004.1301603

[30] R. Yang and Z. Zhang, "Model-based head pose tracking with stereovision," Fifth IEEE International Conference on Automatic Face and Gesture Recognition. Proceedings., pp. 255--260, 2002

. [31] H. a. K. T. Schneiderman, "A statistical method for 3D object detection applied to faces and cars," IEEE Conference on Computer Vision and Pattern Recognition., pp. 746--751, 2000.

[32] J. M. Rehg, M. Loughlin and K. Waters, "Vision for a smart kiosk," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 690--696, 1997.
https://doi.org/10.1109/cvpr.1997.609401

[33] D. Paul, "A speaker-stress resistant HMM isolated word recognizer," IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'87, pp. 713--716, 1987.
https://doi.org/10.1109/ICASSP.1987.1169551

[34] A. Pentland, B. Moghaddam and T. Starner, "View-based and modular eigenspaces for face recognition," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'94, pp. 84--91, 1994. [35] K. Nickel, E. Scemann and R. Stiefelhagen, "3D-tracking of head and hands for pointing gesture recognition in a human-robot interaction scenario," Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 565--570, 2004.


[36] F. Moreno, A. Tarrida, J. Andrade-Cetto and A. Sanfeliu, "3D real-time head tracking fusing color histograms and stereovision," International Conference on Pattern Recognition, pp. 368--371, 2002.
https://doi.org/10.1109/icpr.2002.1044727

[37] C. Morimoto, Y. Yacoob and L. Davis, "Recognition of head gestures using hidden Markov models," International Conference on Pattern Recognition, pp. 461--465, 1996.
https://doi.org/10.1109/ICPR.1996.546990

[38] O. Kwon, J. Chun and P. Park, "Cylindrical model-based head tracking and 3D pose recovery from sequential face images," International Conference on Hybrid Information Technology, ICHIT'06., 2006.

[39] T. Horprasert, Y. Yacoob and L. S. Davis, "Computing 3-d head orientation from a monocular image sequence," International Conference on Automatic Face and Gesture Recognition, pp. 242--247, 1996.
https://doi.org/10.1109/AFGR.1996.557271

[40] C. Huang, X. Ding and C. Fang, "Head pose estimation based on random forests for multiclass classification," International Conference on Pattern Recognition (ICPR), pp. 934--937, 2010.
https://doi.org/10.1109/icpr.2010.234

[41] G. Guo, Y. Fu, C. R. Dyer and T. S. Huang, "Head pose estimation: Classification or regression?," 19th International Conference on Pattern Recognition, pp. 1--4, 2008.
https://doi.org/10.1109/icpr.2008.4761081

[42] B. Heisele, P. Ho and T. Poggio, "Face recognition with support vector machines: Global versus component-based approach," IEEE International Conference on Computer Vision, pp. 688--694, 2001.
https://doi.org/10.1109/iccv.2001.937693

[43] B. Heiselet, T. Serre, M. Pontil and T. Poggio, "Component-based face detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. I--657, 2001.
https://doi.org/10.1109/cvpr.2001.990537

[44] S. Birchfield, "Elliptical head tracking using intensity gradients and color histograms," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 232--237, 1998.
https://doi.org/10.1109/cvpr.1998.698614

[45] L. M. Brown, "3D head tracking using motion adaptive texture-mapping," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. I--998, 2001.
https://doi.org/10.1109/cvpr.2001.990639

[46] G. J. Edwards, C. J. Taylor and T. F. Cootes, "Interpreting face images using active appearance models," IEEE International Conference on Automatic Face and Gesture Recognition, pp. 300--305, 1998.
https://doi.org/10.1109/AFGR.1998.670965

[47] R. El Kaliouby and P. Robinson, "Real-Time Inference of Complex Mantal States from Facial Expressions and Head Gestures," Computer Vision and Pattern Recognition workshop, pp. 154 - 154, 27 July 2004.

[48] N. Oliver, A. P. Pentland and F. Berard, "Lafter: Lips and face real time tracker," Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, pp. 123--129, 1997.
https://doi.org/10.1109/cvpr.1997.609309

[49] G. C. Littlewort, M. S. Bartlett, L. P. Salamanca and J. Reilly, "Automated measurement of children's facial expressions during problem solving tasks," Automatic Face \& Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pp. 30--35, 2011.
https://doi.org/10.1109/fg.2011.5771418

[50] G. Littlewort, M. S. Bartlett and I. Fasel, "Dynamics of facial expression extracted automatically from video," Image and Vision Computing, vol. 24, no. 6, pp. 615--625, 2006.
https://doi.org/10.1016/j.imavis.2005.09.011

[51] J. J. Lien, T. Kanade, J. F. Cohn and C.-C. Li, "Automated facial expression recognition based on FACS action units," Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pp. 390--395, 1998.
https://doi.org/10.1109/afgr.1998.670980

[51] J. J. Lien, T. Kanade, J. F. Cohn and C.-C. Li, "Automated facial expression recognition based on FACS action units," Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pp. 390--395, 1998.
https://doi.org/10.1109/afgr.1998.670980

[52] U. M. Erdem and S. Sclaroff, "Automatic detection of relevant head gestures in American Sign Language communication," International Conference on Pattern Recognition, pp. 460--463, 2002.
https://doi.org/10.1109/icpr.2002.1044759

[53] L. P. Morency, P. Sundberg and T. Darrell, "Pose estimation using 3D view-based eigenspaces," IEEE International Workshop on Analysis and Modeling of Faces and Gestures, pp. 45--52, 2003.
https://doi.org/10.1109/amfg.2003.1240823

[54] R. El Kaliouby and P. Robinson, "Real Time Head Gesture Recognition in Affective Interfaces," Interact, 2003.

All views and opinions are the author's and do not necessarily reflected those of any organisation they are associated with. Twitter: @scottturneruon