ABSTRACT
The Earth Mover's Distance (EMD) is one of the most-widely used distance functions to measure the similarity between two multimedia objects. While providing good search results, the EMD is too much time-consuming to be used in large multimedia databases. To solve the problem, we propose an approximate k-nearest neighbor (k-NN) search method based on the EMD. First, the proposed method builds an index using the M-tree, a distance-based multi-dimensional index structure, to reduce the disk access overhead. When building the index, we reduce the number of features in the multimedia objects through dimensionality-reduction. When performing the k-NN search on the M-tree, we find a small set of candidates from the disk using the index and then perform the post-processing on them. Second, the proposed method uses the approximate EMD for index retrieval and post-processing to reduce the computational overhead of the EMD. To compensate the errors due to the approximation, the method provides a way of accuracy improvement of the approximate EMD. We performed extensive experiments to show the efficiency of the proposed method.
- Y. Liu, G. Zhange, W. Ma. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recognition 40 (1), 262--282. Google ScholarDigital Library
- E. Rashedi, H. Nezamabadi-pour, S. Saryazdi. 2013. A simultaneous feature adaptation and feature selection method for content-based image retrieval systems. Knowledge-Based Systems 39, 85--94 Google ScholarDigital Library
- E. Yildizer, A. Balci, T. Jarada, R. Alhajj. 2012. Integrating wavelets with clustering and indexing for effective content-based image retrieval. Knowledge-Based Systems 31, 55--66 Google ScholarDigital Library
- P. Ciaccia, M. Patella, P. Zezula. 1997. M-tree: An efficient access method for similarity search in metric spaces. Proceedings of Very Large Data Bases, 426--435. Google ScholarDigital Library
- J. Xu, Z. Zhang, A. Tung, G. Yu. 2010. Efficient and effective similarity search over probabilistic data based on earth mover's distance. Proceedings of International Conference on Very Large Data Bases, 758--769.Google ScholarDigital Library
- I. Assent, A. Wenning, T. Seidl. 2006. Approximate techniques for indexing the earth mover's distance in multimedia databases. Proceedings of IEEE International Conference on Data Engineering, 11. Google ScholarDigital Library
- H. Chen, et al. 2011. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowledge-Based Systems 24 (8), 1348--1359. Google ScholarDigital Library
- T. Seidl, H. Kriegel. 1998. Optimal multi-step k-nearest neighbor search. Proceedings of ACM International Conference on Management of Data, 154--165. Google ScholarDigital Library
- B. Ruttenberg, A. Singh. 2012. Indexing the earth mover's distance using normal distributions, Proceeding of International Conference on Very Large Data Bases, 205--216. Google ScholarDigital Library
- I. Assent, M. Wichterich, T. Meisen, T. Seidl. 2008. Efficient similarity search using the earth mover's distance for large multimedia databases. Proceedings of IEEE International Conference on Data Engineering, 307--316. Google ScholarDigital Library
- Y. Rubner, C. Tomasi, J. Guibas. 2000. The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision 40 (2), 99--121. Google ScholarDigital Library
- S. Shirdhonka, D. Jacobs. 2008. Approximate earth mover's distance in linear time. Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 1--8.Google Scholar
- M. Jang, S. Kim, C. Faloutsos, S. Park. 2011. A linear-time approximation of the earth mover's distance. Proceedings of ACM International Conference on Information Knowledge Management, 505--514. Google ScholarDigital Library
- V. Ljosa, A. Bhattacharya. 2006. Indexing spatially sensitive distance measures using multi-resolution lower-bounds. Proceedings of EDBT International Conference on Extending Database Technology, 865--883. Google ScholarDigital Library
- M. Wichterich, I. Assent, P. Kranen, T. Seidl. 2008. Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. Proceedings of ACM International Conference on Management of Data, 199--211. Google ScholarDigital Library
- B. Moon, et al. 2001. Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering 13 (1), 124--141. Google ScholarDigital Library
- C. Papadimitrous, K. Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. 1st ed. NewYork: Dover Publications.Google Scholar
- Y. Tang, et al. 2013. The earth mover's distance based similarity search at scale. Proceedings of the VLDB Endowment, 313--324. Google ScholarDigital Library
- N. Beckmann, H. Kriegel, R. Schneider, B. Seeger. 1990, The r-tree: An efficient and robust access method for points and rectangles. Proceedings of ACM International Conference on Management of Data, 322--331. Google ScholarDigital Library
- M. Jang, et al. 2011. On extracting perception-based features for effective similar shader retrieval. Proceedings of IEEE International Conference on Computer Software and Applications, 103--107. Google ScholarDigital Library
- J. Wang, J. Li, G. Wiederhold. 2001. Simplicity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (9), 947--963. Google ScholarDigital Library
- J. Shepperd, C. Schofield. Estimating software project effort using analogies. IEEE Transactions on Software Engineering 1997; 23 (11): 736--743 Google ScholarDigital Library
Index Terms
- Approximate k-Nearest Neighbor Search Based on the Earth Mover's Distance for Efficient Content-based Information Retrieval
Recommendations
Fast Instance Search Based on Approximate Bichromatic Reverse Nearest Neighbor Search
MM '14: Proceedings of the 22nd ACM international conference on MultimediaIn the TRECVID Instance Search (INS) task, it is known that use of BM25, which is an improvement of the TFIDF,greatly improves retrieval performance. Its calculation, however, requires tremendous amount of computational cost and this fact makes its use ...
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08: Proceedings of the 2008 IEEE 24th International Conference on Data EngineeringA method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as Locality Sensitive Hashing (LSH), have been successfully applied for similarity ...
An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing
Database and Expert Systems ApplicationsAbstractThis paper proposes an approximate nearest neighbor search algorithm for high-dimensional data. The proposed algorithm is based on a distance-based hashing called adaptive flexible distance-based hashing (AFDH). For a given query, AFDH returns a ...
Comments