Abstract
Gesture Recognition technology has been widely used in virtual reality and human-computer interaction. This paper proposed a virtual music control system based on dynamic hand gesture recognition. The system mode was mainly designed and realized by three modules including control terminal, client terminal and the server. By capturing the gesture image sequence via a cellphone camera, the system is able to recognize information characters of gestures such as number of fingers and movement of gesture trace. Control terminal generate different instructions and send them to client terminal via server. Relative experiments showed that the interaction system had good applicability and portability.
Similar content being viewed by others
Keywords
1 Introduction
Recently, virtual band or Orchestra has been becoming a hot research topic in virtual reality research field. Mendelsohn museum’s virtual Orchestra adopts Leap Motion [1] to recognize gesture. Dynamic matrix sound system (DMS system) [2], developed by Key Laboratory of Media Audio & Video Ministry of Education, is a multi-channel input and output sound system which is based on source synthesis and Huygens’ Principle. Gesture recognition-based band direction is one of the most important foundations of virtual band. Kindest and Leap Motion are commonly used in hand gesture recognition. But Leap Motion’s motion capture range is limited. And kinect can’t accurately identify hand gesture. Besides, both devices’ adaptability is poor because they need to use SDK to configure development. Nowadays, most mobile terminal devices such as iPhones and tablets are equipped with monocular stationary camera. Devices with monocular camera are more portable in comparison with Kinect and Leap Motion.
In the area of hand gesture recognition and target tracking algorithm, Angelov et al. proposed Recursive Density Estimation technique, and used SIFT features to find key points of the rectangular box area to achieve object tracking and gesture recognition [3]. JongShill Lee et al. segmented hand gesture information from video sequences which have complex background and finally achieved a recognition rate of 95% [4]. Gesture library of OpenCV could realize gesture recognition based on the Hu matrix and stereo vision. Jin-Yinn et al. identified and extracted features by B-spline curves, and adopted the Bayes rule to classify target and gesture recognition [5]. Xiuhui Wang et al. collected hand gesture information through numbers of cameras and combined with hand model features to match and train database [6]. Haibing Ren et al. completed the segmentation of skin color under complex background [7]. Jiyu Zhu et al. constructed two dimensional hand gestures by extracting the information of palm center and mass of palm center [8]. Zhiyong Xiao et al. accomplished remote computer control by hand gesture recognition [9].
With the inspiration of gesture recognition and VR technology, we design a virtual orchestra studio to display music. In order to increase the portability of the system, we use mobile terminal devices with monocular camera to capture gesture. Based on Xcode platform and dynamic hand gesture recognition technology, we developed a virtual music control system. This paper is organized in the following way: Sect. 1 introduces the background of our study and related study of gesture recognize. Section 2 presents the detailed introduction of the virtual music control system. Section 3 introduces the algorithm we adopt in this paper. Section 4 described experiment. Lastly, Sect. 5 is conclusions and future prospects.
2 Pipeline of the Virtual Music Control System
There are three modules in the virtual music control system: control terminal, client terminal and server. Firstly, from the gesture images captured by mobile devices, control terminal recognizes hand gesture and generates control instructions. Secondly, control instructions are sent to client terminal via server. The server is an indispensable part to ensure real-time communication. Thirdly, client terminal receives and responds to instructions. Figure 1 shows the pipeline of the virtual music control system.
2.1 Control Terminal
Control terminal is responsible for analyzing hand gestures. In this session, we design a gesture recognize APP. After capturing hand gesture, control terminal operates image preprocessing, gesture segmentation, dynamic gesture tracking, contour feature extraction, gesture recognition, and gesture instruction definition and transmission. Considering of the above process, the control terminal is designed as shown in Fig. 2.
Our virtual music control system is able to realize many functions about music control, such as play, stop, rhythm change, switch songs and etc. With the number of fingers and the direction of hand trajectorys, we can define different instructions. Firstly, when the gesture recognition is completed, we begin to detect the number of fingers. If 30 frame images are all determined to have 5 fingers, music will be paused. If the number of fingers is 2, songs will be switched; in the case of the trajectory is towards left, last song will play; in the case of the trajectory is towards right, next song will play. If the number of fingers is 4, music rhythm will be changed; in the case of the trajectory is towards left, music rhythm will speed up, in the case of the trajectory is towards right, music rhythm will slow down. Instructions for music controling are defined as shown in Fig. 3.
2.2 Client Terminal
The virtual music control system can realize the control of music by accepting instructions from control terminal. In order to make the client running well in all LAN, we build a web version of music player which will ensure well compatibility of the system. It’s easy for users to operate because they just need to open a web page instead of downloading a particular program. Opening any browser equipment installed in the same local area network (LAN), and entering the client site, our web version of music player will run.
2.3 Client Terminal
The server module is responsible for information transmission and real-time communication between control terminal and client terminal. Websocket protocol is used in the construction of web page which can satisfy the demand of real-time communication. In traditional real-time communication method, such as Ajax polling, browser needs to constantly send server request and long links which will consume a lot of bandwidth and server resources. Websocket protocol proposed by HTML5 establishes a long TCP link between browser and server to actively push data to client, and realizes bidirectional real-time communication. What’s more, we can directly use JavaScript to achieve communication in any Websocket browser.
3 Algorithm Description
In this section, I introduce some hand gesture algorithm. Firstly, according to the process of gesture segmentation, part A introduces the main methods of image preprocessing and two gesture segmentation methods. Part B introduces dynamic gesture tracking method and proposes our improved algorithm. Part C introduces the contour based feature extraction and calibration and detection of finger. Lastly, part D introduces methods for recognizing gestures and related gesture command.
3.1 Hand Gesture Segmentation
Hand gesture segmentation based on skin color uses hand skin color as significant characteristics, because characterization of human hand skin color information will not be influenced by different capture devices [10]. But this method can’t fully realize exact segmentation and contour extraction. We improved the trajectory tracking algorithm as follows.
Firstly, color images captured by camera are preprocessed. And smoothing filtering method is adopted to eliminate noise interference [11]. Secondly, we maximize the segmentation of skin color region, and generate binary image D1. Thirdly, we separate the dynamic gesture from static background to generate the dynamic gesture binary image D2. Last but not least, the merger of D1 and D2 are applied to obtain the complete binary image. Figure 4 shows the gesture segmentation process in detail.
3.2 Dynamic Hand Gesture Tracking
The flexibility of human hand will lead to large gesture transformation, and requirement of real-time performance is higher for the interactive recognition system. As time interval between adjacent frames is very short, the difference of hand gesture between adjacent frames is very small. So, motion between adjacent two frames is recognized as uniform motion. We adopt dynamic hand gesture tracking algorithm based on Cam-shift tracking algorithm and Kalman filter prediction. Following is the detailed algorithm process.
Firstly, we initialize the tracking area, and use Cam-shift tracking algorithm. If tracking failed or be occluded in the process, we need to retain the searching window for the second time tracking and searching. Secondly, we adopt Kalman filter in real-time prediction and update the searching window as well as in Cam-shift tracking. Thirdly, keep on continuously updating and following up until hand gestures are tracked. Cam-shift tracking algorithm’s adaptability is not well, but our gesture tracking process will improve this weakness.
3.3 Feature Extraction Based on Contour
In this part, we conduct feature extraction based on gesture contour, including smoothing, convex point detection and fingertip calibration. We use open source libraries in Canny computational theory of edge detection to detect convex buns and convex hull defects of finger contours. Next, we smooth fingers’ contour curve to eliminate the defect of the contour. Figure 5 shows the contrast effects of smoothing.
We all know that, curves of fingertip and finger seam are more bent than fingers’ outer contour. We set a certain threshold to determine fingertips and finger seams, and then we need to distinguish fingertips from finger seams. We use image transmission method [12] to detect the number of fingers. Figure 6 shows the image transmission method adopted on hand.
3.4 Gesture Recognition
In our experiment, gesture recognition is based on contour recognition which can be achieved through template matching method. We will sum up all points on the contour to get contour moment. A contour moment \( \left( {\upalpha ,\;\upbeta} \right) \) is defined as follows:
α stand for moments on X axis, β stand for moments on Y axis. Use (1) to calculate the summation of all the points. m represents the point on the boundary of gesture contour only when \( \upalpha\; = \;0 \) and \( \upbeta\; = \;0 \).
Hu moment is a kind of outline of contour moment, which has the characteristic of geometric invariance. By using the normalized central moment, Hu moment defines various matrix equations with rotation, translation, scaling characteristic in the second and three order. Following are two matrix equations which are better for 2D gesture contour. When \( \upalpha\; + \;\upbeta = 0 \), then
When completing the characteristic data acquisition of the gesture image, we invoke the cvMatShapes function in OpenCV to realize the similarity calculation of two profiles. The smaller the results are, the more similar two pictures will be. The following is our hand gesture template in shown Fig. 7.
4 Experiments
4.1 Tracking Motion Algorithm
In order to test the recognition performance of our dynamic hand gesture tracking algorithm, we set up a database composed by 100 hand movement scenes Table 1 shows the contrast of gesture recognition rate between the traditional Cam-shift algorithm and the improved algorithm. It clearly shows that the accuracy of the improved Cam-shift algorithm is higher than that of the traditional Cam-shift algorithm.
4.2 Control Terminal
In the experiment part, Xcode platform and iTouch5 will be used. We write a gesture recognition program on Xcode and adopt iTouch 5 to capture dynamic hand gestures. Along with Xcode program compiled successfully, the iTouch5 will generate a gesture recognition APP named ‘Gesture Recognizer’. After a series of gesture analyzing, the Gesture Recognizer APP can distinguish hand gestures. According to different gestures, the program generates different instructions and sends them to client terminal to control music. Figure 8 shows the Gesture Recognizer APP on iTouch and the startup interface of the APP. Figure 9 show the recognition results of number of opening fingers.
4.3 Client Terminal
The virtual music control system can realize the control of music by accepting gesture characteristic information. We use JavaScript, CSS and Html5 to implement the web music player. JavaScript is used to implement the interaction module of web music player. CSS is adopting to create buttons and background image of web music player. And lastly, label < audio > of Html5 has powerful audio function, such as play, play-back, jump, and buffer. Figures 10, 11, and 12 show the interface of web music player changed with songs.
4.4 Server
To set up the server, we should know that servlet API is the nature of servlet. And Jetty is an open source Java server that provides the operating environment for the JSP and servelt to realize web development and production realization.
In the experiment, we set up the server websocket-server to complete gesture image commands’ receive, process, and other processes by sending Jetty.
Via the server websocket-server structures, we ultimately realize the functionality that getting service data from the control terminal, and then send the data from the server to the client websocket-client.
In the experimental, we use the package Jetty to set terminal window server to open the server. And in the command line input java-jar/start.jar command, we can open the server response. Within the same LAN equipment, the recognition APP and can client terminal can realize communication and information transmission.
5 Conclusion and Future Work
In this paper we develop a virtual music control system based on gesture recognition. This paper introduced how the control terminal complete gesture recognition in detail, including image preprocessing, gesture segmentation, feature extraction, trajectory tracking, gesture recognition and instruction definition etc. Our virtual music control system has strong practical significance. Control terminal APP has a strong portability on cellphones, and client terminal’s web music player has strong compatibility on many browsers, and our server module can realize the functions of the real-time communication.
Next step, we will research on hand rotation to increase the diversity of gestures and to define more instructions. And we will keep studying on optimizing related algorithm to improve the accuracy of gesture recognition and shorter gesture recognition time. What’s more, we will add more features on the web player display to make it more beautiful.
References
DAS EFFEKTORIUM. MUSEUM IM MENDELSSOHN-HAUS, LEIPZIG [DB/OL] (2014). http://235media.de/2014/02/museum-im-mendelssohn-haus-leipzig/?lang=de
Wei, Z., Yin, W., Cong, J., Qin, Z.: Comparative study of DMS and conventional stereo based on subjective evaluation experiment. In: 2013 Fourth International Conference on Intelligent Control and Information Processing (ICICIP), pp. 236–240 (2013)
Angelov, P., Gude, C., Sadeghi-Tehran, P., Ivanov, T.: Autonomous real—time object detection and tracking by a moving camera. In: 2012 6th IEEE International Conference Intelligent Systems (IS), pp. 446–452 (2012)
Lee, J.S., Lee, Y.J., Lee, E.H., et al.: Hand region extraction and gesture recognition from video stream with complex background through entropy analysis. In: Proceedings of the 26th Annual International Conference of the IEEE EMBS, San Francisco, CA, USA (2004)
Wang, J.-Y., Cohen, F.S.: 3-D object recognition and shape estimation from image contours using B-splines, shape invariant matching, and neural network. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 13–23 (1994)
Wang, X., Bao, H.: Gesture recognition based on adaptive genetic algorithm. J. Comput.-Aided Des. Comput. Graph. 08, 1056–1062 (2007). Hangzhou
Ren, H., Zhu, Y., Xu, G., Zhang, X., Lin, X.: Hand gesture segmentation and recognition with complex backgrounds. Acta Automatica Sinica|Acta Autom Sin (02), 256–261 (2002)
Zhu, J., Wang, X., Wang, W., Dai, G.: Hand gesture recognition based on structure analysis. Chin. J. Comput. (12), 2130–2137 (2006)
Xiao, Z., Qin, H.: Human-computer interaction based on gaze tracking and gesture recognition. Comput. Eng. (15), 198–200 (2009)
Fan, B., Wang, M., Dong, Y.: Hand gesture segmentation based on skin color detection technology. Comput. Technol. Dev. 18(3), 105–108 (2008)
Zhang, M., Zhongiu, Yu., Yao, S.: Image pretreatment research in recognition of handwritten numerals. Control Autom. 22(16), 256–258 (2006)
Zhong, F.: Machine-vision-based Gesture Recognition System. Wuhan University of Technology (2013)
Acknowledgments
This work is supported by the Projects of NSFC (61371191, 61201236), and Research Project of China SARFT (2015-53).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Zhang, Y., Wang, J., Ye, L., Xue, X., Zhang, Q. (2017). A Virtual Music Control System Based on Dynamic Hand Gesture Recognition. In: Pan, Z., Cheok, A., Müller, W., Zhang, M. (eds) Transactions on Edutainment XIII. Lecture Notes in Computer Science(), vol 10092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54395-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-54395-5_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-54394-8
Online ISBN: 978-3-662-54395-5
eBook Packages: Computer ScienceComputer Science (R0)