Abstract
Perceptual analysis of video (analysis by unaided ear and eye) plays an important role in such disciplines as psychology, psycholinguistics, linguistics, anthropology, and neurology. In the specific domain of psycholinguistic analysis of gesture and speech, researchers micro-analyze videos of subjects using a high quality video cassette recorder that has a digital freeze capability down to the specific frame. Such analyses are very labor intensive and slow. We present a multimedia system for perceptual analysis of video data using a multiple, dynamically linked representation model. The system components are linked through a time portal with a current time focus. The system provides mechanisms to analyze overlapping hierarchical interpretations of the discourse, and integrates visual gesture analysis, speech analysis, video gaze analysis, and text transcription into a coordinated whole. The various interaction components facilitate accurate multi-point access to the data. While this system is currently used to analyze gesture, speech and gaze in human discourse, the system described may be applied to any other field where careful analysis of temporal synchronies in video is important.
Similar content being viewed by others
References
R. Ansari, Y. Dai, J. Lou, D. McNeill, and F. Quek, “Representation of prosodic structure in speech using nonlinear methods,” in 1999 Workshop on Nonlinear Signal and Image Processing, Antalya, Turkey, 1999.
A.F. Bobick, “Representational frames in video annotation,” in Proceedings of the 27th Annual Asilomar Conference on Signals, Systems and Computers,Nov. 1993. Also appears as MIT Media Laboratory Perceptual Computing Section Technical Report No. 251.
S.E. Brennan, “Centering attention in discourse,” Language and Cognitive Processes, Vol. 10, No. 2, pp. 137-167, 1995.
J. Delin, “Presupposition and shared knowledge in it-clefts,” Language and Cognitive Processes, Vol. 10, No. 2, pp. 97-120, 1995.
P.C. Gordon, B.J. Grosz, and L.A. Gilliom, “Pronouns, names, and the centering of attention in discourse,” Cognitive Science, Vol. 17, No. 3, pp. 311-347, 1993.
A. Kendon, “Current issues in the study of gesture,” in The Biological Foundations of Gestures: Motor and Semiotic Aspects, J.-L. Nespoulous, P. Peron, and A.R. Lecours (Eds.), Lawrence Erlbaum Associates: Hillsdale, NJ, 1986, pp. 23-47.
R.B. Kozma, “A reply: Media and methods,” Educational Technology Research and Development, Vol. 42, No. 3, pp. 1-14, 1994.
R.B. Kozma, J. Russel, T. Jones, N. Marx, and J. Davis, “The use of multiple, linked representations to facilitate science understanding,” in International Perspectives on the Design of Technology-Supported Learning Environments, S. Vosniadou, E. De Corte, R. Glaser, and H. Mandl (Eds.), Lawrence Erlbaum Associates: Mahwah, NJ, 1996.
D. Mayhew, Principles and Guidelines in Software User Interface Design, Prentice-Hall Inc.: Englewood Cliffs, NJ, 1992.
D. McNeill, Hand and Mind: What Gestures Reveal about Thought, University of Chicago Press: Chicago, 1992.
C.H. Nakatani, B.J. Grosz, D.D. Ahn, and J. Hirschberg, “Instructions for annotating discourses,” Technical Report TR-21-95, Center for Research in Computing Technology, Harvard U., Cambridge, MA, 1995.
S. Nobe, “Represenational gestures, cognitive rythms, and accoustic aspects of speech: A network-threshold model of gesture production,” PhD Thesis, Department of Psychology, University of Chicago, 1996.
S. Nobe, “When do most spontaneous representational gestures actually occur with respect to speech?” in Language and Gesture, D. McNeill (Ed.), Cambridge University Press: Cambridge, 2000.
F. Quek, “Content-based video access system,” US provisional patent application Serial No. 60/053,353 filed on 07/22/1997. PCT application Serial No. PCT/US98/15063 filed on 07/22/1998. UIC file number: CQ037.
F. Quek, R. Bryll, and X. Ma, “Vector coherence mapping: A parallel algorithm for image flow computation with fuzzy combination of multiple constraints,” Technical Report No. VISLab-01-00, Vision Interfaces and Systems Laboratory, CSE Department, Wright State University, Dayton, OH, January 2001.
F. Quek, D. McNeill, R. Ansari, X. Ma, R. Bryll, S. Duncan, and K.-E. McCullough, “Gesture cues for conversational interaction in monocular video,” in ICCV'99 International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, Corfu, Greece, Sept. 26-27, 1999, pp. 64-69.
F. Quek, D. McNeill, R. Bryll, C. Kirbas, H. Arslan, K.-E. McCullough, N. Furuyama, and R. Ansari, “Gesture, speech, and gaze cues for discourse segmentation,” in IEEE Conf. on CVPR, Hilton Head Island, South Carolina, June 13-15, 2000, Submitted.
Boon-Lock Yeo and M.M. Yeung, “Retrieving and visualizing video,” Communications of the ACM, Vol. 40, No. 12, pp. 43-52, 1997.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Quek, F., Bryll, R., Kirbas, C. et al. A Multimedia System for Temporally Situated Perceptual Psycholinguistic Analysis. Multimedia Tools and Applications 18, 91–114 (2002). https://doi.org/10.1023/A:1016229624425
Issue Date:
DOI: https://doi.org/10.1023/A:1016229624425