Abstract
This paper deals with the performance problem of nearest neighbor queries in voluminous multimedia databases. We propose a data allocation method which allows achieving a \(0(\sqrt{n})\) query processing time in parallel settings. Our proposal is based on the complexity analysis of content based retrieval when it is used a clustering method. We derive a valid range of values for the number of clusters that should be obtained from the database. Then, to efficiently process nearest neighbor queries, we derive the optimal number of nodes to maximize parallel resources. We validated our method through experiments with different high dimensional databases and implemented a query processing algorithm for full k nearest neighbors in a shared nothing cluster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdel-Ghaffar, K.A.S., El Abbadi, A.: Optimal Allocation of Two-Dimensional Data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 409–418. Springer, Heidelberg (1996)
Aggarwal, C.C.: On the Effects of Dimensionality Reduction on High Dimensional Similarity Search. In: ACM PODS 2001: Symposium on Principles of Database Systems Conference, pp. 256–266 (2001)
Aggarwal, C.C.: An efficient subspace sampling framework for high-dimensional data reduction, selectivity estimation, and nearest-neighbor search. IEEE Transactions on Knowledge and Data Engineering 16(10), 1247–1262 (2004)
Alpkocak, A., Danisman, T., Ulker, T.: A Parallel Similarity Search in High Dimensional Metric Space Using M-Tree. In: Grigoras, D., Nicolau, A., Toursel, B., Folliot, B. (eds.) IWCC 2001. LNCS, vol. 2326, pp. 166–171. Springer, Heidelberg (2002)
Attila Gürsoy, E.E.: Data Decomposition for Parallel K-means Clustering. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2004. LNCS, vol. 3019, pp. 241–248. Springer, Heidelberg (2004)
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.: Fast parallel similarity search in multimedia databases. In: SIGMOD Rec., vol. 26(2), pp. 1–12 (1997)
Berrani, S.-A., Amsaleg, L., Gros, P.: Approximate Searches: k-Neighbors + Precision. In: CIKM 2003: Proceedings of the 12th ACM International Conference on Information and Knowledge, pp. 24–31 (2003)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Bok, K.S., Seo, D.M., Song, S.I., Kim, M.H., Yoo, J.S.: An Index Structure for Parallel Processing of Multidimensional Data. In: Fan, W., Wu, Z., Yang, J. (eds.) WAIM 2005. LNCS, vol. 3739, pp. 589–600. Springer, Heidelberg (2005)
Bok, K.S., Song, S.I., Yoo, J.S.: Efficient k-Nearest Neighbor Searches for Parallel Multidimensional Index Structures. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 870–879. Springer, Heidelberg (2006)
Chavez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters 85(1)(16), 39–46 (2003)
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9), 23–32 (1995)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Kamel, I., Faloutsos, C.: Parallel R-trees. In: SIGMOD 1992: Proceedings of the ACM international Conference on Management of Data, pp. 195–204 (1992)
Kanungo, T., Mount, D.M., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 881–892 (2002)
Li, C., Chang, E., Garcia-Molina, H., Wiederhold, G.: Clustering for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering 14(4), 792–808 (2002)
Liu, T., Rosenberg, C.R., Rowley, H.A.: Clustering Billions of Images with Large Scale Nearest Neighbor Search. In: 8th IEEE Workshop on Applications of Computer Vision (WACV 2007), p. 28 (2007)
Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: Indexing the Distance: An Efficient Method to KNN Processing. In: VLDB 2001: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 421–430 (2001)
Ooi, B.C., Tan, K.L., Yu, C., Zhang, R.: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. Journal of the ACM Transactions on Database Systems 30(2), 364–397 (2005)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)
Prabhakar, S., Agrawal, D., El Abbadi, A., Singh, A., Smith, T.: Browsing and placement of multi-resolution images on parallel disks. Multimedia Systems 8(6), 459–469 (2003)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: SIGMOD 1995: Proceedings of the International Conference on Management of Data, San Jose, California, May 22-25, pp. 71–79 (1995)
Schnitzer, B., Leutenegger, S.T.: Master-Client R-Trees: A New Parallel R-Tree Architecture. In: SSDBM 1999: Proceedings of the 11th International Conference on Scientific and Statistical Database Management (1999)
Weber, R., Schek, H.J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: VLDB 1998: Proceedings of the 24th International Conference Very Large Data Bases, pp. 194–205 (1998)
Yu, D., Zhang, A.: ClusterTree: Integration of Cluster Representation and Nearest Neighbor Search for Large Datasets with High Dimensionality. IEEE Transactions on Knowledge and Data Engineering 15(5), 1316–1337 (2003)
Zezula, P., Savino, P., Rabitti, F., Amato, G., Ciaccia, P.: Processing M-trees with parallel resources. In: Research Issues In Data Engineering. Eighth International Workshop on Continuous-Media Databases and Applications, pp. 147–154 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Manjarrez-Sanchez, J., Martinez, J., Valduriez, P. (2008). Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-85654-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85653-5
Online ISBN: 978-3-540-85654-2
eBook Packages: Computer ScienceComputer Science (R0)