Abstract
Establishing visual correspondences is an essential component of many computer vision problems, which is often done with local feature-descriptors. Transmission and storage of these descriptors are of critical importance in the context of mobile visual search applications. We propose a framework for computing low bit-rate feature descriptors with a 20× reduction in bit rate compared to state-of-the-art descriptors. The framework offers low complexity and has significant speed-up in the matching stage. We show how to efficiently compute distances between descriptors in the compressed domain eliminating the need for decoding. We perform a comprehensive performance comparison with SIFT, SURF, BRIEF, MPEG-7 image signatures and other low bit-rate descriptors and show that our proposed CHoG descriptor outperforms existing schemes significantly over a wide range of bitrates. We implement the descriptor in a mobile image retrieval system and for a database of 1 million CD, DVD and book covers, we achieve 96% retrieval accuracy using only 4 KB of data per query image.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amazon (2007). SnapTell. http://www.snaptell.com.
Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2004). Clustering with Bregman divergences. Journal of Machine Learning Research, 234–245.
Bay, H., Tuytelaars, T., & Gool, L. V. (2006). SURF: speeded up robust features. In Proc. of European conference on computer vision (ECCV), Graz, Austria.
Bay, H., Ess, A., Tuytelaars, T., & Gool, L. V. (2008). Speeded-up robust feature. Computer Vision and Image Understanding, 110(3), 346–359. http://dx.doi.org/10.1016/j.cviu.2007.09.014.
Brasnett, P., & Bober, M. (2007). Robust visual identifier using the trace transform. In Proc. of IET visual information engineering conference (VIE), London, UK.
Calonder, M., Lepetit, V., & Fua, P. (2010). Brief: binary robust independent elementary features. In Proc. of European conference on computer vision (ECCV), Crete, Greece.
Chandrasekhar, V., Takacs, G., Chen, D. M., Tsai, S. S., & Girod, B. (2009a). Transform coding of feature descriptors. In Proc. of visual communications and image processing conference (VCIP), San Jose, California.
Chandrasekhar, V., Takacs, G., Chen, D. M., Tsai, S. S., Grzeszczuk, R., & Girod, B. (2009b). CHoG: compressed histogram of gradients—a low bit rate feature descriptor. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), Miami, Florida.
Chandrasekhar, V., Chen, D. M., Lin, A., Takacs, G., Tsai, S. S., Cheung, N. M., Reznik, Y., Grzeszczuk, R., & Girod, B. (2010a). Comparison of local feature descriptors for mobile visual search. In Proc. of IEEE international conference on image processing (ICIP), Hong Kong.
Chandrasekhar, V., Makar, M., Takacs, G., Chen, D., Tsai, S. S., Cheung, N. M., Grzeszczuk, R., Reznik, Y., & Girod, B. (2010b). Survey of SIFT compression schemes. In Proc. of international mobile multimedia workshop (IMMW), IEEE international conference on pattern recognition (ICPR), Istanbul, Turkey.
Chandrasekhar, V., Reznik, Y., Takacs, G., Chen, D. M., Tsai, S. S., Grzeszczuk, R., & Girod, B. (2010c). Study of quantization schemes for low bitrate CHoG descriptors. In Proc. of IEEE international workshop on mobile vision (IWMV), San Francisco, California.
Chou, P. A., Lookabaugh, T., & Gray, R. M. (1989) Entropy constrained vector quantization. IEEE Transactions on Acoustics, Speech and Signal Processing, 37(1).
Conway, J. H., & Sloane, N. J. A. (1982). Fast quantizing and decoding algorithms for lattice quantizers and codes, IEEE Transactions on Information Theory IT28(2), 227–232.
Cover, T. M., & Thomas, J. A. (2006). Wiley series in telecommunications and signal processing. Elements of information theory. New York: Wiley-Interscience.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), San Diego, CA.
Erol, B., Antúnez, E., & Hull, J. (2008). Hotpaper: multimedia interaction with paper using mobile phones. In Proc. of the 16th ACM multimedia conference, New York, NY, USA.
Freeman, W. T., & Roth, M. (1994). Orientation histograms for hand gesture recognition. In Proc. of international workshop on automatic face and gesture recognition (pp. 296–301).
Gagie, T. (2006). Compressing probability distributions. Information Processing Letters, 97(4), 133–137. http://dx.doi.org/10.1016/j.ipl.2005.10.006.
Girod, B., Chandrasekhar, V., Chen, D. M., Cheung, N. M., Grzeszczuk, R., Reznik, Y., Takacs, G., Tsai, S. S., & Vedantham, R. (2010). Mobile visual search. IEEE signal processing magazine. Special Issue on Mobile Media Search, under review.
Google (2009) Google Goggles. http://www.google.com/mobile/goggles/.
Graham, J., & Hull, J. J. (2008). Icandy: a tangible user interface for itunes. In Proc. of CHI ’08: extended abstracts on human factors in computing systems, Florence, Italy.
Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In Proc. of international conference on computer vision (ICCV), Rio de Janeiro, Brazil.
Hull, J. J., Erol, B., Graham, J., Ke, Q., Kishi, H., Moraleda, J., & Olst, D. G. V. (2007). Paper-based augmented reality. In Proc. of the 17th international conference on artificial reality and telexistence (ICAT), Washington, DC, USA.
Jegou, H., Douze, M., & Schmid, C. (2008). Hamming embedding and weak geometric consistency for large scale image search. In Proc. of European conference on computer vision (ECCV), Berlin, Heidelberg.
Jegou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, accepted.
Johnson, M. (2010). Generalized descriptor compression for storage and matching. In Proc. of British machine vision conference (BMVC).
Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: a more distinctive representation for local image descriptors. In Proc. of conference on computer vision and pattern recognition (CVPR) (Vol. 02, pp. 506–513). Washington: IEEE Computer Society.
Kooaba (2007) Kooaba. http://www.kooaba.com.
Kullback, S. (1987). The Kullback-Leibler distance. The American Statistician, 41, 340–341.
Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), Los Alamitos, CA.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Makar, M., Chang, C., Chen, D. M., Tsai, S. S., & Girod, B. (2009). Compression of image patches for local feature extraction. In Proc. of IEEE international conference on acoustics, speech and signal processing (ICASSP), Taipei, Taiwan.
Mikolajczyk, K., & Schmid, C. (2005). Performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. http://dx.doi.org/10.1109/TPAMI.2005.188.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. V. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72. http://dx.doi.org/10.1007/s11263-005-3848-x.
Nistér, D., & Stewénius, H. (2006). Scalable recognition with a vocabulary tree. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), New York, USA.
Nokia (2006). Nokia point and find. http://www.pointandfind.nokia.com.
Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization—improving particular object retrieval in large scale image databases. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska.
Rebollo-Monedero, D. (2007). Quantization and transforms for distributed source coding. PhD thesis, Department of Electrical Engineering, Stanford University.
Reznik, Y., Chandrasekhar, V., Takacs, G., Chen, D. M., Tsai, S. S., Grzeszczuk, R., & Girod, B. (2010). Fast quantization and matching of histogram-based image features. In Proc. of SPIE workshop on applications of digital image processing (ADIP), San Diego, California.
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The Earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121. http://dx.doi.org/10.1023/A:1026543900054.
Shakhnarovich, G., & Darrell, T. (2005). Learning task-specific similarity. Thesis.
Shao, H., Svoboda, T., & Gool, L.V. (2003). Zubud-Zürich buildings database for image based recognition (Tech. Rep. 260). ETH Zürich.
Sommerville, D. M. Y. (1958). An introduction to the geometry of n dimensions. New York: Dover.
Takacs, G., Chandrasekhar, V., Gelfand, N., Xiong, Y., Chen, W., Bismpigiannis, T., Grzeszczuk, R., Pulli, K., & Girod, B. (2008). Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proc. of ACM international conference on multimedia information retrieval (ACM MIR), Canada, Vancouver.
Tola, E., Lepetit, V., & Fua, P. (2008). A fast local descriptor for dense matching. In Proc. of IEEE conference on computer vision and pattern recognition (pp. 1–8). doi:10.1109/CVPR.2008.4587673.
Torralba, A., Fergus, R., & Weiss, Y. (2008). Small codes and large image databases for recognition. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), Anchorage, Alaska.
Tsai, S. S., Chen, D. M., Chandrasekhar, V., Takacs, G., Cheung, N. M., Vedantham, R., Grzeszczuk, R., & Girod, B. (2010). Mobile product recognition. In Proc. of ACM multimedia (ACM MM), Florence, Italy.
Weiss, Y., Torralba, A., & Fergus, R. (2008). Spectral hashing. In Proc. of neural information processing systems (NIPS), Vancouver, BC, Canada.
Winder, S., & Brown, M. (2007). Learning local image descriptors. In Proc. of IEEE conference on computer vision and pattern recognition (CVPR), Minneapolis, Minnesota (pp. 1–8). doi:10.1109/CVPR.2007.382971.
Winder, S., Hua, G., & Brown, M. (2009). Picking the best daisy. In Proc. of computer vision and pattern recognition (CVPR), Miami, Florida.
Yeo, C., Ahammad, P., & Ramchandran, K. (2008). Rate-efficient visual correspondences using random projections. In Proc. of IEEE international conference on image processing (ICIP), San Diego, California.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was first presented as an oral presentation at Computer Vision and Pattern Recognition (CVPR), 2009. Since then, the authors have studied feature compression in more detail in Chandrasekhar et al. (2009a, 2010a, 2010b, 2010c). A default implementation of CHoG is available at http://www.stanford.edu/vijayc/.
Rights and permissions
About this article
Cite this article
Chandrasekhar, V., Takacs, G., Chen, D.M. et al. Compressed Histogram of Gradients: A Low-Bitrate Descriptor. Int J Comput Vis 96, 384–399 (2012). https://doi.org/10.1007/s11263-011-0453-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0453-z