skip to main content
research-article

Real-life performance of metric searching

Published:01 July 2010Publication History
Skip Abstract Section

Abstract

Similarity is a central notion throughout human lives and it will soon become the prevalent strategy for dealing with digital content also in computer systems. But the exponential growth of data makes the scalability and performance issues serious matters of concern. Contemporary decentralized media of mass communication allowing cooperative and collaborative practices enable users autonomously contribute to production of global media, whose elements are in fact related by numerous multi-facet links of similarity. As an example, consider the sites like Flickr, YouTube, or Facebook that host user-contributed heterogeneous content for a variety of events. Accordingly, the core ability of future data processing systems is the similarity management of large and ever growing volumes of data. In a simplified way, the real-life performance can be constrained from two points of view: (1) the query response time, and (2) the query execution throughput, i.e. the number of queries processed per a unit of time. Typically, the query response time should be on-line, say less than one second, but the query execution throughput can even be expected in hundreds or thousands in case of large-scale web applications.

References

  1. }}M. Batko, D. Novak, F. Falchi, and P. Zezula. On scalability of the similarity search in the world of peers. In INFOSCALE, pages 1--12. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Comm. ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}V. Dohnal, C. Gennaro, P. Savino, and P. Zezula. D-Index: Distance searching index for metric data sets. Multimedia Tools and Applications, 21(1):9--33, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}C. Doulkeridis, A. Vlachou, Y. Kotidis, and M. Vazirgiannis. Efficient range query processing in metric spaces over highly distributed data. Distributed and Parallel Databases, 26(2--3):155--180, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}I. King, C. H. Ng, and K. C. Sia. Distributed content-based visual information retrieval system on peer-to-peer networks. ACM TOIS, 22(3):477--501, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}J. Lin. Brute force and indexed approaches to pairwise document similarity comparisons with mapreduce. In SIGIR, pages 155--162. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}D. Novak, M. Batko, and P. Zezula. Generic similarity search engine demonstrated by an image retrieval application. In SIGIR, page 840. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}H. Samet. Foundations of Multidimensional And Metric Data Structures. Series in Data Management Systems. Morgan Kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}J. Sedmidubsky, S. Bartoň, V. Dohnal, and P. Zezula. Adaptive approximate similarity searching through metric social networks. In ICDE, pages 1424--1426. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}T. Skopal. Pivoting M-tree: A metric access method for efficient similarity search. In DATESO, volume 98. Technical University of Aachen, 2004.Google ScholarGoogle Scholar
  12. }}C. Traina, Jr., A. J. M. Traina, B. Seeger, and C. Faloutsos. Slim-Trees: High performance metric trees minimizing overlap between nodes. In EDBT, volume 1777 of Lecture Notes in Computer Science, pages 51--65. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using mapreduce. In SIGMOD, pages 495--506. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search: The Metric Space Approach, volume 32 of Advances in Database Systems. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}P. Zezula, P. Savino, F. Rabitti, G. Amato, and P. Ciaccia. Processing M-trees with parallel resources. In RIDE, pages 147--154. IEEE, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Real-life performance of metric searching

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image SIGSPATIAL Special
              SIGSPATIAL Special  Volume 2, Issue 2
              July 2010
              38 pages
              EISSN:1946-7729
              DOI:10.1145/1862413
              Issue’s Table of Contents

              Copyright © 2010 Authors

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2010

              Check for updates

              Qualifiers

              • research-article
            • Article Metrics

              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader