skip to main content
10.1145/1031171.1031236acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

SWAM: a family of access methods for similarity-search in peer-to-peer data networks

Published: 13 November 2004 Publication History

Abstract

Peer-to-peer Data Networks (PDNs) are large-scale, self-organizing, distributed query processing systems. Familiar examples of PDN are peer-to-peer file-sharing networks, which support exact-match search queries to locate user-requested files. In this paper, we formalize the more general problem of <i>similarity-search</i> in PDNs, and propose a <i>family</i> of distributed access methods, termed <i>Small-World Access Methods (SWAM)</i>, for efficient execution of various similarity-search queries, namely exact-match, range, and k-nearest-neighbor queries. Unlike its predecessors, i.e., LH* and DHTs, SWAM does not control the assignment of data objects to PDN nodes; each node autonomously stores its own data. Besides, SWAM supports all similarity-search queries on multiple attributes. SWAM guarantees that the query object will be found (if it exists in the network) in average time logarithmically proportional to the network size. Moreover, once the query object is found, all the similar objects would be in its proximate network neighborhood and hence enabling efficient range and k-nearest-neighbor queries.
As a specific instance of SWAM, we propose <i>SWAM-V</i>, a Voronoi-based SWAM that indexes PDNs with multi-attribute data objects. For a PDN with <i>N</i> nodes SWAM-V has query time, communication cost, and computation cost of <i>O</i>(log <i>N</i>) for exact-match queries, and <i>O</i>(log <i>N</i> + <b>s</b><i>N</i>) and <i>O</i>(log <i>N</i> + <b>k</b>) for range queries (with selectivity <b>s</b>) and <b>k</b>NN queries, respectively. Our experiments show that SWAM-V consistently outperforms a similarity-search enabled version of CAN in query time and communication cost by a factor of 2 to 3.

References

[1]
K. Aberer, P. Cudré-Mauroux, A. Datta, Z. Despotovic, M. Hauswirth, M. Punceva, and R. Schmidt. P-grid: A self-organizing structured p2p system. SIGMOD Record, 32(2), 2003.
[2]
F. Banaei-Kashani and C. Shahabi. Efficient flooding in power-law networks. In Proceedings of Twenty-Second ACM Symposium on Principles of Distributed Computing (PODC'03), July 2003.
[3]
F. Banaei-Kashani and C. Shahabi. Searchable querical data networks. In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing in conjunction with VLDB'03, September 2003.
[4]
M. Bawa, G. Manku, and P. Raghavan. SETS: Search enhanced by topic-segmentation. In Proceedings of the 26th Annual International Conference on Research and Development in Informaion Retrieval (SIGIR'03), August 2003.
[5]
B. Bollobás. Random Graphs. Academic Press, New York, 1985.
[6]
S. Brin. Near neighbor search in large metric spaces. In Proceedings of the 21th International Conferenceon Very Large Data Bases (VLDB'95), September 1995.
[7]
E. Chavez, G. Navarro, R. A. Baeza-Yates, and J. L. Marroquin. Searching in metric spaces. ACM Computing Surveys, 33(3), 2001.
[8]
A. Doan, P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings of ACM International Conference on Management of Data (SIGMOD'01), November 2001.
[9]
Freenet. Freenet project, 2004. http://freenet.sourceforge.net/.
[10]
V. Gaede and O. Günther. Multidimensional access methods. ACM Computing Surveys, 30(2), 1997.
[11]
A. Gupta, D. Agrawal, and A. El Abbadi. Approximate range selection queries in peer-to-peer systems. In Proceedings of the First Biennial Conference on Innovative Data Systems Research, January 2003.
[12]
R. Huebsch, N. Lanham, B. Loo, J. Hellerstein, S. Shenker, and I. Stoica. Querying the inernet withPIER. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB'03), September 2003.
[13]
KaZaA. Sharman networks, 2004. http://www.kazaa.com/.
[14]
J. Kleinberg. The small-world phenomenon: an algorithmic perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing, May 2000.
[15]
W. Litwin, M. Neimat, and D. Schneider. LH*:A scalable, distributed data structure. ACM Transactions on Database Systems, 21(4), 1996.
[16]
G. Navarro. Searching in metric spaces by spatial approximation. The Very Large Databases Journal (VLDBJ), 11(1), 2002.
[17]
A. Okabe, B. Boots, K. Sugihara, and S. Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley, 2nd edition, 2000.
[18]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In Proceedings of ACM SIGCOMM '01, August 2001.
[19]
R. Seidel. Exact upper bounds for the number of faces in d-dimensional Voronoi diagrams, DIMACS Series, volume~4. American Mathematical Society, 1991.
[20]
SETI@home, 2004. http://setiathome.ssl.berkeley.edu//.
[21]
I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of ACM SIGCOMM '01, August 2001.
[22]
D. Watts and S. Strogatz. Collective dynamics of small world networks. Nature, (393):440--442, 1998.
[23]
B. Yang and H. Garcia-Molina. Designing a super-peer network. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03), March 2003.

Cited By

View all

Recommendations

Reviews

Maytham Hassan Safar

Banaei-Kashani and Shahabi describe querical data networks (QDNs), which are large-scale, self-organizing, distributed query processing systems. As an example, they present a peer-to-peer network. The data is distributed among the nodes, and each node knows its data, plus the data of some of the nodes that it is connected to. To keep the network scalable, there are only a few connections made from a node to other nodes. The authors provide a method to perform typical similarity queries on peer-to-peer networks. In addition, they provide a mechanism to support different query types, such as range, and k-nn queries. The authors' solution relies on already existing data partitioning techniques in the value space, where each data item is considered to exist only once. The space of all the data items in the network is decomposed once hierarchically, and once using Voronoi diagram partitioning techniques. Then, a grid of connections is created that consists of some random connections between nodes, and connections with other nodes that are considered as neighbors of a node by the decomposition method. The two components are overlaid, and the resulting grid forms a connection network of the "virtual" nodes of data items. Finally, a reverse mapping, from each data item in this virtual space to the actual storage nodes, is performed, and the resulting network is presented. Queries are shown to be exactly answerable using the resulting network. The authors fail to describe the applications that would use their technique with examples of real queries. Instead, only abstract configurations are mentioned in their experimental results. The curse of dimensionality issue, which implies that, in high-dimensional spaces, there is no useful locality to exploit, is not discussed. This locality is crucial to the proposed method. The authors should more carefully investigate what happens to the communication cost as the dimensionality of their space increases. The authors assume exactly one item per node, and provided an argument showing how the general case can be transformed to this restricted case. By doing this transformation, and making the data size proportional to the number of nodes in the system, one loses track of data dependence. The authors imply that the complexity of their method is linear in the data set size, regardless of the number of dimensions of the data, which is a strange-sounding result. Overall, this is a good work, except for the experimental setup. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
November 2004
678 pages
ISBN:1581138741
DOI:10.1145/1031171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed hash table (DHT)
  2. peer-to-peer networks
  3. similarity search
  4. small-world

Qualifiers

  • Article

Conference

CIKM04
Sponsor:
CIKM04: Conference on Information and Knowledge Management
November 8 - 13, 2004
D.C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Distributed Similarity Queries in Metric SpacesData Science and Engineering10.1007/s41019-019-0095-7Online publication date: 28-Jun-2019
  • (2018)Distributed k-Nearest Neighbor Queries in Metric SpacesWeb and Big Data10.1007/978-3-319-96890-2_20(236-252)Online publication date: 19-Jul-2018
  • (2014)VITAL: Structured and clustered super-peer network for similarity searchPeer-to-Peer Networking and Applications10.1007/s12083-014-0304-08:6(965-991)Online publication date: 5-Aug-2014
  • (2012)Metric-Based similarity search in unstructured peer-to-peer systemsTransactions on Large-Scale Data- and Knowledge-Centered Systems V10.5555/2184170.2184172(28-48)Online publication date: 1-Jan-2012
  • (2012)HyperDexACM SIGCOMM Computer Communication Review10.1145/2377677.237768142:4(25-36)Online publication date: 13-Aug-2012
  • (2012)HyperDexProceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication10.1145/2342356.2342360(25-36)Online publication date: 13-Aug-2012
  • (2012)Metric-Based Similarity Search in Unstructured Peer-to-Peer SystemsTransactions on Large-Scale Data- and Knowledge-Centered Systems V10.1007/978-3-642-28148-8_2(28-48)Online publication date: 2012
  • (2011)Chord-Based Indexing Model to Support Complex Query and Load BalancingKey Engineering Materials10.4028/www.scientific.net/KEM.474-476.1781474-476(1781-1786)Online publication date: Apr-2011
  • (2011)Peer-to-Peer Data ManagementSynthesis Lectures on Data Management10.2200/S00338ED1V01Y201104DTM0153:2(1-150)Online publication date: 31-May-2011
  • (2011)P2P network for storage and query of a spatio-temporal flow of events2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops)10.1109/PERCOMW.2011.5766938(483-489)Online publication date: Mar-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media