Abstract
Identifying people and tracking their locations is a key prerequisite to achieving context awareness in smart spaces. Moreover, in realistic context-aware applications, these tasks have to be carried out in a non-obtrusive fashion. In this paper we present a set of robust person-identification and tracking algorithms, based on audio and visual processing. A main characteristic of these algorithms is that they operate on far-field and un-constrained audio–visual streams, which ensure that they are non-intrusive. We also illustrate that the combination of their outputs can lead to composite multimodal tracking components, which are suitable for supporting a broad range of context-aware services. In combining audio–visual processing results, we exploit a context-modeling approach based on a graph of situations. Accordingly, we discuss the implementation of realistic prototype applications that make use of the full range of audio, visual and multimodal algorithms.








Similar content being viewed by others
References
Weiser M (1991) The computer for the 21st century. Sci Am 265(3):66–75
Anind D, Salber D, Abowd G (2001) A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction, Lawrence Erlbaum Associates, 16
Want R, Hopper A, Falcao V, Gibbons J (1992) The active badge location system. ACM Trans Inform Syst 10(1):91–102
Smailagic A, Siewiorek DP (2002) Application design for wearable and context-aware computers. IEEE Pervasive Comput 1(4):20–29
Johanson B, Fox A, Winograd T (2002) The interactive workspaces project: experiences with ubiquitous computing rooms. IEEE Pervasive Computi Magaz 1(2)
Ekenel H, Pnevmatikakis A (2006) Video-based face recognition evaluation in the CHIL Project—run 1. Face and gesture recognition, Southampton, UK, pp 85–90
McIvor A (2000) Background subtraction techniques. Image and Vision Computing, New Zealand
Stauffer C, Grimson WEL (2000) Learning patterns of activity using real-time tracking. IEEE Trans Pattern Anal and Machine Intel 22:747–757
KaewTraKulPong P, Bowden R (2001) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Proceedings of 2nd European workshop on advanced video based surveillance systems (AVBS01)
Landabaso JL, Pardas M (2005) Foreground regions extraction and characterization towards real-time object tracking. In: Proceedings of joint workshop on multimodal interaction and related machine learning algorithms (MLMI ’05)
Xu LQ, Landabaso JL, Pardas M (1986) Shadow removal with blob-based morphological reconstruction for error correction. IEEE international conference on acoustics, speech, and signal processing
Blackman S (1986) Multiple-target tracking with radar applications, Chap. 14. Artech House, Dedham
Jones M, Rehg J (2002) Statistical color models with application to skin detection. Int J Comput Vision 46(1):81–96
Pnevmatikakis A, Polymenakos L (2005) A testing methodology for face recognition algorithms. In: Renals S, Bengio S (eds) MLMI 2005, Lecture Notes in Computer Science, vol 3869. Springer, Berlin, pp 218–229
Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3):226–239
Knapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech, Signal Process 24(4):320–327
Talantzis F, Constantinides AG, Polymenakos L (2005) Estimation of direction of arrival using information theory. IEEE Signal Process 12(8):561–564
Bell A, Sejnowski T (1995) An information maximization approach to blind separation and blind deconvolution. Neural Comput 7:1129–1159
Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New York
Smith J, Abel J (1987) Closed-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process ASSP 35:1661–1669
Stergiou A, Pnevmatikakis A, Polymenakos L (2006) A decision fusion system across time and classifiers for audio–visual person identification. In: Stiefelhagen R, Garofolo J (eds) CLEAR 2006, Lecture Notes in Computer Science. Springer, Berlin
Strobel N, Spors S, Rabenstein R (2001) Joint audio–video signal processing for object localization and tracking. In: Brandstein M, Ward D (eds) Microphone arrays, Springer, Heidelberg
Crowley JL (2003) Context driven observation of human activity. In: Proceedings of the European symposium on ambient intelligence
Soldatos J, Pandis I, Stamatis K, Polymenakos L, Crowley J (2006) A middleware infrastructure for autonomous context-aware computing services, computer communications magazine, special Issue on emerging middleware for next generation networks
Azodolmolky S, Dimakis N, Mylonakis V, Souretis G, Soldatos J, Pnevmatikakis A, Polymenakos L (2005) Middleware for indoor ambient intelligence: the PolyOmaton system. In: Proceedings of the 2nd NGNM Workshop, Networking 2005, Waterloo, Canada
Soldatos J, Polymenakos L, Pnevmatikakis A, Talantzis F, Stamatis K, Carras M (2005) Perceptual interfaces and distributed agents supporting ubiquitous computing services. In: Proceedings of the Eurescom Summit, pp. 43–50
Acknowledgments
This work is sponsored by the European Union under the integrated project CHIL, contract number 506909.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pnevmatikakis, A., Soldatos, J., Talantzis, F. et al. Robust multimodal audio–visual processing for advanced context awareness in smart spaces. Pers Ubiquit Comput 13, 3–14 (2009). https://doi.org/10.1007/s00779-007-0169-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-007-0169-9