ABSTRACT
It is known that all multi-dimensional index structures fail to accelerate content-based similarity searches when the feature vectors describing images are high-dimensional. It is possible to circumvent this problem by relying on approximate search-schemes trading-off result quality for reduced query execution time. Most approximate schemes, however, provide none or only complex control on the precision of the searches, especially when retrieving the k nearest neighbors (NNs) of query points.In contrast, this paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points. It allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector that would be in the exact answer set to be missed in the approximate set of answers eventually returned. This paper also presents a performance study of the implementation using real datasets showing its reliability and efficiency. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more than 5 106 24-dimensional vectors, even when the probability of missing one of the true nearest neighbors is below 0.01.
- L. Amsaleg and P. Gros. Content-based retrieval using local descriptors: Problems and issues from a database perspective. Pattern Analysis and Applications, Special Issue on Image Indexation, 4:108--124, 2001.]]Google Scholar
- L. Amsaleg, P. Gros, and S.-A. Berrani. A robust technique to recognize objects in images, and the db problems it raises. In Proceedings of the 7th International Workshop on Multimedia Information Systems, Capri, Italy, November 2001.]]Google Scholar
- L. Amsaleg, P. Gros, and S.-A. Berrani. Robust object recognition in images and the related database problems. Special issue of the Journal of Multimedia Tools and Applications, 2003 (To appear).]] Google ScholarDigital Library
- K. P. Bennett, U. Fayyad, and D. Geiger. Density-based indexing for approximate nearest-neighbor queries. In Proceedings of the 5th acm International Conference on Knowledge Discovery and Data Mining, San Diego, California, USA, pages 233--243, August 1999.]] Google ScholarDigital Library
- K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor" meaningful? In Proceedings of the 7th International Conference on Database Theory, Jerusalem, Israel, pages 217--235. Springer, January 1999.]] Google ScholarDigital Library
- C. B öhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. acm Computing Surveys, 33(3):322--373, September 2001.]] Google ScholarDigital Library
- G. Brown. Modern Mathematics for the Engineer. 1956.]]Google Scholar
- P. Ciaccia and M. Patella. Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, pages 244--255, February 2000.]] Google ScholarDigital Library
- C. Faloutsos. Searching Multimedia Databases by Content. Kluwer Academic Publishers, 1996.]] Google ScholarDigital Library
- H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, pages 503--511, April 2001.]] Google ScholarDigital Library
- L. Florack, B. ter Haar Romeny, J. Koenderink, and M. Viergever. General intensity transformation and differential invariants. Journal of Mathematical Imaging and Vision , 4(2):171--187, 1994.]]Google ScholarCross Ref
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pages 518--529, September 1999.]] Google ScholarDigital Library
- J. Goldstein and R. Ramakrishnan. Contrast plots and p-sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 429--440, September 2000.]] Google ScholarDigital Library
- D. Knuth. Art of Computer Programming, Volume 2: Seminumerical Algorithms, pages 135--136. 1997.]] Google ScholarDigital Library
- C. Li, E. Chang, H. Garcia-Molina, and G. Wiederhold. Clustering for approximate similarity search in high-dimensional spaces. IEEE Transactions on Knowledge and Data Engineering, 14(4):792--808, July 2002.]] Google ScholarDigital Library
- B.-U. Pagel, F. Korn, and C. Faloutsos. Deflating the dimensionality curse using multiple fractal dimensions. In Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, pages 589--598, March 2000.]] Google ScholarDigital Library
- C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530--534, May 1997.]] Google ScholarDigital Library
- R. Weber and K. B öhm. Trading quality for time with nearest neighbor search. In Proceedings of the 7th Conference on Extending Database Technology, Konstanz, Germany, pages 21--35, March 2000.]] Google ScholarDigital Library
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases, New York City, New York, USA , pages 194--205, August 1998.]] Google ScholarDigital Library
- D. A. White and R. Jain. Similarity indexing with the ss-tree. In Proceedings of the 12th International Conference on Data Engineering, New Orleans, Louisiana, USA, pages 516--523, February 1996.]] Google ScholarDigital Library
- T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In Proceedings of the acm sigmod International Conference on Management of Data, Montreal, Canada, pages 103--114, June 1996.]] Google ScholarDigital Library
Index Terms
- Approximate searches: k-neighbors + precision
Recommendations
On Approximate Nearest Neighbors under l∞ Norm
The nearest neighbor search (NNS) problem is the following: Given a set of n points P={p1, , pn} in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q X. The approximate ...
Secure and efficient approximate nearest neighbors search
IH&MMSec '13: Proceedings of the first ACM workshop on Information hiding and multimedia securityThis paper presents a moderately secure but very efficient approximate nearest neighbors search. After detailing the threats pertaining to the `honest but curious' model, our approach starts from a state-of-the-art algorithm in the domain of approximate ...
Fast and Accurate Handwritten Character Recognition Using Approximate Nearest Neighbours Search on Large Databases
Proceedings of the Joint IAPR International Workshops on Advances in Pattern RecognitionIn this work, fast approximate nearest neighbours search algorithms are shown to provide high accuracies, similar to those of exact nearest neighbour search, at a fraction of the computational cost in an OCR task. Recent studies [26,15] have shown the ...
Comments