Abstract
The scale of multimedia data collections is expanding at a very fast rate. In order to cope with this growth, the high-dimensional indexing methods used for content-based multimedia retrieval must adapt gracefully to secondary storage. Recent progress in storage technology, however, means that algorithm designers must now cope with a spectrum of secondary storage solutions, ranging from traditional magnetic hard drives to state-of-the-art solid state disks. We study the impact of storage technology on a simple, prototypical high-dimensional indexing method for large scale query processing. We show that while the algorithm implementation deeply impacts the performance of the indexing method, the choice of underlying storage technology is equally important.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)
Athanassoulis, M., Ailamaki, A., Chen, S., Gibbons, P.B., Stoica, R.: Flash in a dbms: Where and how? IEEE Data Eng. Bull. 33(4), 28–34 (2010)
Bonnet, P., Bouganim, L.: Flash device support for database management. In: CIDR, pp. 1–8 (2011), www.crdrdb.org
Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: Understanding flash IO patterns. In: Proc. CIDR (2009)
Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE 96(4), 668–696 (2008)
Chierichetti, F., Panconesi, A., Raghavan, P., Sozio, M., Tiberi, A., Upfal, E.: Finding near neighbors through cluster pruning. In: Proc. PODS (2007)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 5:1–5:60 (2008)
Gudmundsson, G., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proc. ACMMM–Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (2010)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE TPAMI 33(1), 117–128 (2011)
Lejsek, H., Ásmundsson, F.H., Jónsson, B.T., Amsaleg, L.: NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. Pattern Anal. Mach. Intell. 31, 869–883 (2009)
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)
Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31(11), 1348–1358 (2010)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Proc. CVPR (2008)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM TODS 31(3), 814–838 (2006)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gudmundsson, G.Þ., Amsaleg, L., Jónsson, B.Þ. (2012). Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation. In: Yu, H., Yu, G., Hsu, W., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29023-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-29023-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29022-0
Online ISBN: 978-3-642-29023-7
eBook Packages: Computer ScienceComputer Science (R0)