Visual Digest Networks

Cai, Yang; Milcent, Guillaume; Marian, Ludmila

doi:10.1007/978-3-540-89430-8_3

Yang Cai²,
Guillaume Milcent² &
Ludmila Marian²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4650))

994 Accesses

Abstract

Attention, understanding and abstraction are three key elements in our visual communication that we have taken for granted. These interconnected elements constitute a Visual Digest Network. In this chapter, we investigate the conceptual design of Visual Digest Networks at three visual abstraction levels: gaze, object and word. The goal is to minimize the media footprint during visual communication while sustaining essential semantic communication. The Attentive Video Network is designed to detect the operator’s gaze and adjust the video resolution at the sensor side across the network. Our results show significant improvements in network bandwidth utilization. The Object Video Network is designed for mobile video network applications, where faces and cars are detected. The multi-resolution profiles are configured for media according to the network footprint. The video is sent across the network with multiple resolutions and metadata; controlled by the bandwidth regulator. The results show that the video can be transmitted in the low-bandwidth conditions. Finally, the Image-Word Search Network is designed for face reconstruction across the network. In this study, we assume the hidden layer between the facial features and referral expressive words contain ‘control points’ that can be articulated mathematically, visually and verbally. This experiment is a crude model of the semantic network. Nevertheless, we see the potential of the twoway mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cai, Y.: How Many Pixels Do We Need to See Things? In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2659, pp. 1064–1073. Springer, Heidelberg (2003)
Chapter Google Scholar
Arnham, R.: Visual Thinking. University of California Press (1969)
Google Scholar
Allport, A.: Visual Attention. MIT Press, Cambridge (1993)
Google Scholar
Web site, http://www.webexhibits.org/colorart/ag.html
Yarbus, A.L.: Eye Movements during Perception of Complex Objects. Plenum Press, New York (1967)
Book Google Scholar
Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth 10,000 words. Cognitive Science 11, 65–100 (1987)
Article Google Scholar
Duchowski, A.T., et al.: Gaze-Contingent Displays: A Review. Cyber-Psychology and Behavior 7(6) (2004)
Google Scholar
Kortum, P., Geisler, W.: Implementation of a foveated image coding system for image bandwidth reduction. In: SPIE Proceedings, vol. 2657, pp. 350–360 (1996)
Google Scholar
Geisler, W.S., Perry, J.S.: Real-time foveated multiresolution system for low-bandwidth video communication. In: Proceedings of Human Vision and Electronic Imaging. SPIE, Bellingham (1998)
Google Scholar
Majaranta, P., Raiha, K.J.: Twenty years of eye typing: systems and design issues. In: Eye Tracking Research and Applications (ETRA) Symposium. ACM, New Orleans (2002)
Google Scholar
Shell, J.S., Selker, T., Vertegaal, R.: Interacting with groups of computers. Communications of the ACM 46, 40–46 (2003)
Article Google Scholar
Patent US20000568196 Bell Cynthia s (us) Microdisplay with eye gaze detection
Google Scholar
Gibbens, R.J., Hunt, P.J.: Effective bandwidths for the multi-type UAS channel. Queueing Systems: Theory and Applications 9(1-2), 17–28 (1991)
Article MATH Google Scholar
Sidi, M., Liu, W.Z., Cidon, L., Gopal, I.: Congestion control through input rate regulation. In: Proc. GLOBECOM 1989, Dallas, TX, pp. 1764–1768 (1989)
Google Scholar
Errin, W.F., Reeves, D.S.: Bandwidth provisioning and pricing for networks with multiple classes of service. Computer Networks: The International Journal of Computer and Telecommunications Networking 46(1), 41–52 (2004)
Article Google Scholar
Patent US20040978903 Zimmerman Ofer (il); Stanwood Kenneth l (us); Bourlas Yair (us). Method and apparatus for bandwidth request/grant protocols in a wireless communication system
Google Scholar
Weiman, C.F.R.: Video Compression via Log Polar Mapping. In: SPIE Proceedings: Real-Time Image Processing II, vol. 1295, pp. 266–277 (1990)
Google Scholar
Patent CA20032494956 Kandhadai Ananthapadmanabhan a (us); Manjunath Sharath (in); Bandwidth-adaptive quantization
Google Scholar
Patent WO2003EP10523 Riedel Michael (de); Neumann Roland (de) System and method for lossless reduction of bandwidth of a data stream transmitted via a digital multimedia link.
Google Scholar
Patent EP20030767118 Turner r Brough (us); Bruemmer Kevin j (us); Matatia Michael(us) methods and apparatus for network signal aggregation and bandwidth reduction.
Google Scholar
Web site, www.eyetechds.com
Roberts, L.G.: Machine perception of three-dimensional solids. In: Tippett, J.P. (ed.) Optical and Electro-optical Information Processing. MIT Press, Cambridge (1965)
Google Scholar
Grimson, W.E.L.: The combinatorics of heuristic search termination for object recognition in cluttered environment. IEEE Trans. Patt. Anal. Mach. Intell (1991)
Google Scholar
Gaston, P.C., Lozano-Perez, T.: Tactile recognition and localization using object models: The case of polyhedral on a plane. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(9), 920–935 (1984)
Google Scholar
Lowe, D.G.: The viewpoint consistency constraint. Int. J. Comput. Vision 1(1), 57–72 (1987)
Article MathSciNet Google Scholar
Fan, T.J., Medioni, G., Nevatia, R.: Recognizing 3-D objects using surface descriptions. IEEE Trans. Patt. Anal. Mach. Intell. 11, 1140–1157 (1989)
Article Google Scholar
Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition 13, 111–122 (1981)
Article MATH Google Scholar
Grimson, W.E.L., Huttenlocher, D.P.: On the sensitivity of the Hough transform for object recognition. IEEE Trans. Patt. Anal. Mach. Intell. 13(9), 920–935 (1990)
Article Google Scholar
Silberberg, T.M., Davis, L.S., Harwood, D.A.: An iterative Hough procedure for three-dimensional object recognition. Pattern Recognition 17(6), 621–629 (1984)
Article Google Scholar
Ullman, S., Basri, R.: Recognition by linear combinations of models. IEEE Trans. Patt. Anal. Mach. Intell. 13, 992–1006 (1991)
Article Google Scholar
Chen, J.L., Stockman, G.C., Rao, K.: Recovering and tracking pose of curved 3D objects from 2D images. In: Proc. of IEEE Comput. Vis. Patt. Rec., New York (1993)
Google Scholar
Forsyth, D., et al.: Invariant descriptors for 3-D object recognition and pose. IEEE Trans. Patt. Anal. Mach. Intell. PAMI 13, 917–991 (1991)
Google Scholar
Web site, http://www.chiariglione.org/MPEG/standards/mpeg-7/mpeg-7.htm
Luo, J., et al.: Pictures are not taken in a vacuum. IEEE Signal Processing Magazine (March 2006)
Google Scholar
Haar, A.: Zur Theorie der orthogonalen Funktionensysteme. Math. Ann. 69, 331–371 (1910)
Article MathSciNet MATH Google Scholar
Viola, P., Jones, M.: Rapid Object detection using a Boosted Cascade of Simple Features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, December 8-14, vol. 1, pp. 511–518. IEEE Computer Society Press, New York (2001)
Google Scholar
Lipton, A.J., Fujiyoshi, H., Patil, R.S.: Moving target classification and tracking from real-time video. In: Proc. IEEE Workshop Applications of Computer Vision (1998)
Google Scholar
Tabachneck-Schijf, H.J.M., Leonardo, A.M., Simon, H.A.: CaMeRa: A computational model of multiple representations. Cognitive Science 21, 305–350 (1997)
Article Google Scholar
Solso, R.L.: Cognition and the Visual Arts. MIT Press, Cambridge (1993)
Google Scholar
Roy, D.: Learning from Sights and Sounds: A Computational Model. Ph.D. in Media Arts and Sciences, MIT (pdf) (1999)
Google Scholar
Doctorow, E.L.: Loon Lake. Random House, New York (1980)
Google Scholar
Isherwood, C.: Goodbye to Berlin. Signet (1952)
Google Scholar
Updike, J.: The Rabbit is Rich. Ballantine Books (1996)
Google Scholar
FBI Facial Identification Catalog (November 1988)
Google Scholar
Web site: (2007), http://en.wikipedia.org/wiki/Spline_mathematics
Kaufer, D., Ishizaki, S., Butler, B., Collins, J.: The Power of Words: Unveiling the Speaker and Writer’s Hidden Craft. Lawrence Erlbaum, Mahwah (2004)
Google Scholar
Grambs, D.: The Describer’s Dictionary. W W Norton & Co Inc (1995)
Google Scholar
Archambault, A., Corbeil, J.-C.: The Macmillan Visual Dictionary. Macmillan, Basingstoke (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, USA
Yang Cai, Guillaume Milcent & Ludmila Marian

Authors

Yang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Milcent
View author publications
You can also search for this author in PubMed Google Scholar
Ludmila Marian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ambient Intelligence Lab, CIC-2218, Carnegie Mellon University, 4720 Forbes Avenue, PA 15213, Pittsburgh, USA
Yang Cai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cai, Y., Milcent, G., Marian, L. (2008). Visual Digest Networks. In: Cai, Y. (eds) Digital Human Modeling. Lecture Notes in Computer Science(), vol 4650. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89430-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-89430-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89429-2
Online ISBN: 978-3-540-89430-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics