Skip to main content

Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7240))

Abstract

The scale of multimedia data collections is expanding at a very fast rate. In order to cope with this growth, the high-dimensional indexing methods used for content-based multimedia retrieval must adapt gracefully to secondary storage. Recent progress in storage technology, however, means that algorithm designers must now cope with a spectrum of secondary storage solutions, ranging from traditional magnetic hard drives to state-of-the-art solid state disks. We study the impact of storage technology on a simple, prototypical high-dimensional indexing method for large scale query processing. We show that while the algorithm implementation deeply impacts the performance of the indexing method, the choice of underlying storage technology is equally important.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)

    Article  Google Scholar 

  2. Athanassoulis, M., Ailamaki, A., Chen, S., Gibbons, P.B., Stoica, R.: Flash in a dbms: Where and how? IEEE Data Eng. Bull. 33(4), 28–34 (2010)

    Google Scholar 

  3. Bonnet, P., Bouganim, L.: Flash device support for database management. In: CIDR, pp. 1–8 (2011), www.crdrdb.org

  4. Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: Understanding flash IO patterns. In: Proc. CIDR (2009)

    Google Scholar 

  5. Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE 96(4), 668–696 (2008)

    Article  Google Scholar 

  6. Chierichetti, F., Panconesi, A., Raghavan, P., Sozio, M., Tiberi, A., Upfal, E.: Finding near neighbors through cluster pruning. In: Proc. PODS (2007)

    Google Scholar 

  7. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 5:1–5:60 (2008)

    Article  Google Scholar 

  8. Gudmundsson, G., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proc. ACMMM–Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (2010)

    Google Scholar 

  9. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE TPAMI 33(1), 117–128 (2011)

    Article  Google Scholar 

  10. Lejsek, H., Ásmundsson, F.H., Jónsson, B.T., Amsaleg, L.: NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. Pattern Anal. Mach. Intell. 31, 869–883 (2009)

    Article  Google Scholar 

  11. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)

    Article  Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)

    Google Scholar 

  13. Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31(11), 1348–1358 (2010)

    Article  Google Scholar 

  14. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Proc. CVPR (2008)

    Google Scholar 

  15. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  16. Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM TODS 31(3), 814–838 (2006)

    Article  Google Scholar 

  17. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gudmundsson, G.Þ., Amsaleg, L., Jónsson, B.Þ. (2012). Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation. In: Yu, H., Yu, G., Hsu, W., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29023-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29023-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29022-0

  • Online ISBN: 978-3-642-29023-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics