skip to main content
10.1145/956863.956870acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Approximate searches: k-neighbors + precision

Published:03 November 2003Publication History

ABSTRACT

It is known that all multi-dimensional index structures fail to accelerate content-based similarity searches when the feature vectors describing images are high-dimensional. It is possible to circumvent this problem by relying on approximate search-schemes trading-off result quality for reduced query execution time. Most approximate schemes, however, provide none or only complex control on the precision of the searches, especially when retrieving the k nearest neighbors (NNs) of query points.In contrast, this paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points. It allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missed in the approximate set of answers eventually returned. This paper also presents a performance study of the implementation using real datasets showing its reliability and efficiency. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 106 24-dimensional vectors, even when the probability of missing one of the true nearest neighbors is below 0.01.

References

  1. L. Amsaleg and P. Gros. Content-based retrieval using local descriptors: Problems and issues from a database perspective. Pattern Analysis and Applications, Special Issue on Image Indexation, 4:108--124, 2001.]]Google ScholarGoogle Scholar
  2. L. Amsaleg, P. Gros, and S.-A. Berrani. A robust technique to recognize objects in images, and the db problems it raises. In Proceedings of the 7th International Workshop on Multimedia Information Systems, Capri, Italy, November 2001.]]Google ScholarGoogle Scholar
  3. L. Amsaleg, P. Gros, and S.-A. Berrani. Robust object recognition in images and the related database problems. Special issue of the Journal of Multimedia Tools and Applications, 2003 (To appear).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. P. Bennett, U. Fayyad, and D. Geiger. Density-based indexing for approximate nearest-neighbor queries. In Proceedings of the 5th acm International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, pages 233--243, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor" meaningful? In Proceedings of the 7th International Conference on Database Theory, Jerusalem, Israel, pages 217--235. Springer, January 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. B öhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. acm Computing Surveys, 33(3):322--373, September 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Brown. Modern Mathematics for the Engineer. 1956.]]Google ScholarGoogle Scholar
  8. P. Ciaccia and M. Patella. Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, pages 244--255, February 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Faloutsos. Searching Multimedia Databases by Content. Kluwer Academic Publishers, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, pages 503--511, April 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Florack, B. ter Haar Romeny, J. Koenderink, and M. Viergever. General intensity transformation and differential invariants. Journal of Mathematical Imaging and Vision , 4(2):171--187, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pages 518--529, September 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 429--440, September 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Knuth. Art of Computer Programming, Volume 2: Seminumerical Algorithms, pages 135--136. 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering, 14(4):792--808, July 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, pages 589--598, March 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530--534, May 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Weber and K. B öhm. Trading quality for time with nearest neighbor search. In Proceedings of the 7th Conference on Extending Database Technology, Konstanz, Germany, pages 21--35, March 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases, New York City, New York, USA , pages 194--205, August 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. A. White and R. Jain. Similarity indexing with the ss-tree. In Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, USA, pages 516--523, February 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proceedings of the acm sigmod International Conference on Management of Data, Montreal, Canada, pages 103--114, June 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximate searches: k-neighbors + precision

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader