Abstract
Similarity search in metric spaces over centralized systems has been significantly studied in the database research community. However, not so much work has been done in the context of P2P networks. This paper introduces SiMPSON: a P2P system supporting similarity search in metric spaces. The aim is to answer queries faster and using less resources than existing systems. For this, each peer first clusters its own data using any off-the-shelf clustering algorithms. Then, the resulting clusters are mapped to one-dimensional values. Finally, these one-dimensional values are indexed into a structured P2P overlay. Our method slightly increases the indexing overhead, but allows us to greatly reduce the number of peers and messages involved in query processing: we trade a small amount of overhead in the data publishing process for a substantial reduction of costs in the querying phase. Based on this architecture, we propose algorithms for processing range and kNN queries. Extensive experimental results validate the claims of efficiency and effectiveness of SiMPSON.
Chapter PDF
Similar content being viewed by others
References
Banaei-Kashani, F., Shahabi, C.: Swam: a family of access methods for similarity-search in peer-to-peer data networks. In: ACM CIKM, pp. 304–313 (2004)
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: WWW, pp. 651–660 (2005)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: ACM SIGMOD, pp. 322–331 (1990)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
Blackard, J.A.: Covertype Data Set, Colorado State University (1998), http://archive.ics.uci.edu/ml/datasets/Covertype
Chavez, E., Navarror, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)
Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzoniak, M., Bowman, M.: Planetlab: An overlay testbed for broad-coverage services. ACM SIGCOMM Computer Communication Review 33(3) (2003)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M.: Peer-to-peer similarity search in metric spaces. In: VLDB (2007)
Falchi, F., Gennaro, C., Zezula, P.: A content-addressable network for similarity search in metric spaces. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 98–110. Springer, Heidelberg (2007)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: ACM SIGMOD, pp. 47–57 (1984)
Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Transactions on Database Systems (TODS) 28(4), 517–580 (2003)
Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: iDistance: An adaptive B + -tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS) 30(2), 364–397 (2005)
Jagadish, H.V., Ooi, B.C., Vu, Q.H.: BATON: A balanced tree structure for Peer-to-Peer networks. In: VLDB (2005)
Jagadish, H.V., Ooi, B.C., Vu, Q.H., Zhang, R., Zhou, A.: VBI-tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: ICDE (2006)
Karger, D., Kaashoek, F., Stoica, I., Morris, R., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: ACM SIGCOMM, pp. 149–160 (2001)
Li, M., Lee, W.-C., Sivasubramaniam, A.: DPTree: A balanced tree based indexing framework for peer-to-peer systems. In: ICNP (2006)
Novak, D., Zezula, P.: M-chord: a scalable distributed similarity search structure. In: InfoScale (2006)
Ooi, B.C., Tan, K.-L., Yu, C., Bressan, S.: Indexing the edges: a simple and yet efficient approach to high-dimensional indexing. In: ACM PODS (2000)
Ortega-Binderberger, M.: Image features extracted from a Corel image collection (1999), http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.data.html
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: ACM SIGCOMM, pp. 161–172 (2001)
Sellis, T., Roussopoulos, N., Faloutsos, C.: The R + -tree: A dynamic index for multi-dimensional objects. In: VLDB, pp. 507–518 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vu, Q.H., Lupu, M., Wu, S. (2009). SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks. In: Sips, H., Epema, D., Lin, HX. (eds) Euro-Par 2009 Parallel Processing. Euro-Par 2009. Lecture Notes in Computer Science, vol 5704. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03869-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-03869-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03868-6
Online ISBN: 978-3-642-03869-3
eBook Packages: Computer ScienceComputer Science (R0)