ABSTRACT
The recent rapid growth of scientific data necessitates efficient similarity search techniques for which convenient object representation models are of vital importance. Feature signatures denoting highly flexible object feature representations have increasingly gained attention for which corresponding efficiency improvement techniques are developed. In this paper, we focus on efficient query processing with the well-known Earth Mover's Distance (EMD) on databases of feature signatures, and propose efficient approximation techniques successfully applicable to high-dimensional feature signatures via dimensionality reduction, guaranteeing both completeness and no false-dismissal within a filter-and-refine architecture. Rigorous experiments on real world data indicate a considerable reduction in the number of EMD computations and high efficiency of the proposed techniques which significantly reduce the query processing time.
- A. Andoni, P. Indyk, and R. Krauthgamer. Earth mover distance over high-dimensional spaces. SODA, pages 343--352, 2008. Google ScholarDigital Library
- A. Armiti and M. Gertz. Geometric graph matching and similarity: A probabilistic approach. SSDBM '14, pages 27:1--27:12, 2014. Google ScholarDigital Library
- I. Assent, A. Wenning, and T. Seidl. Approximation techniques for indexing the earth mover's distance in multimedia databases. In ICDE, page 11, 2006. Google ScholarDigital Library
- I. Assent, M. Wichterich, T. Meisen, and T. Seidl. Efficient similarity search using the earth mover's distance for large multimedia databases. In ICDE, pages 307--316, 2008. Google ScholarDigital Library
- I. Assent, M. Wichterich, and T. Seidl. Adaptable distance functions for similarity-based multimedia retrieval. Datenbank-Spektrum, 6(19):23--31, 2006.Google Scholar
- C. Beecks. Distance-based similarity models for content-based multimedia retrieval. PhD thesis, RWTH Aachen University, 2013.Google Scholar
- C. Beecks, M. S. Uysal, and T. Seidl. Signature quadratic form distance. In CIVR, p. 438--445, 2010. Google ScholarDigital Library
- C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33:322--373, 2001. Google ScholarDigital Library
- R. S. Chavez and T. F. Heatherton. Representational similarity of social and valence information in the medial pfc. J. Cogn. Neuroscience, 27(1):73--82, 2015.Google ScholarDigital Library
- R. Cheng, L. Chen, J. Chen, and X. Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In EDBT 2009, pages 672--683, 2009. Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255, June 2009.Google ScholarCross Ref
- C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD, 23(2):419--429, May 1994. Google ScholarDigital Library
- D. Fenz, D. Lange, A. Rheinländer, F. Naumann, and U. Leser. Efficient similarity search in very large string sets. In SSDBM 2012, pages 262--279, 2012. Google ScholarDigital Library
- F. Hillier and G. Lieberman. Introduction to Linear Programming. McGraw-Hill, 1990.Google Scholar
- A. Hinneburg and W. Lehner. Database support for 3d-protein data set analysis. In SSDBM, pages 161--170, 2003. Google ScholarDigital Library
- M. E. Houle, X. Ma, M. Nett, and V. Oria. Dimensional testing for multi-step similarity search. In ICDM, pages 299--308, 2012. Google ScholarDigital Library
- F. Korn, N. Sidiropoulos, C. Faloutsos, E. L. Siegel, and Z. P. Fast nearest neighbor search in medical image databases. In VLDB, pages 215--226, 1996. Google ScholarDigital Library
- D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, pages 2161--2168, 2006. Google ScholarDigital Library
- S. Nutanong, N. Carey, Y. Ahmad, A. S. Szalay, and T. B. Woolf. Adaptive exploration for large-scale protein analysis in the molecular dynamics database. In SSDBM, pages 45:1--45:4, 2013. Google ScholarDigital Library
- Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. Journal of Computer Vision, 40(2):99--121, 2000. Google ScholarDigital Library
- B. E. Ruttenberg and A. K. Singh. Indexing the earth mover's distance using normal distributions. PVLDB, 5(3):205--216, 2011. Google ScholarDigital Library
- T. Seidl and H.-P. Kriegel. Optimal multi-step k-nearest neighbor search. In SIGMOD, pages 154--165, 1998. Google ScholarDigital Library
- J. Strötgen, M. Gertz, and C. Junghans. An event-centric model for multilingual document similarity. In ACM SIGIR, pages 953--962, 2011. Google ScholarDigital Library
- H. Tamura, S. Mori, and T. Yamawaki. Textural features corresponding to visual perception. TSMC, 8(6):460--473, 1978.Google ScholarCross Ref
- Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. Earth mover's distance based similarity search at scale. PVLDB, 7(4):313--324, 2013. Google ScholarDigital Library
- M. S. Uysal, C. Beecks, J. Schmücking, and T. Seidl. Efficient filter approximation using the Earth Mover's Distance in very large multimedia databases with feature signatures. In CIKM, pages 979--988, 2014. Google ScholarDigital Library
- M. S. Uysal, C. Beecks, and T. Seidl. On efficient query processing with the earth mover's distance. In PIKM@CIKM, pages 25--32, 2014. Google ScholarDigital Library
- M. Wichterich, I. Assent, P. Kranen, and T. Seidl. Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In SIGMOD, pages 199--212, 2008. Google ScholarDigital Library
- J. Xu, Z. Zhang, A. K. H. Tung, and G. Yu. Efficient and effective similarity search over probabilistic data based on earth mover's distance. PVLDB, 3(1):758--769, 2010. Google ScholarDigital Library
- P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. 2006. Google ScholarDigital Library
Index Terms
- Efficient similarity search in scientific databases with feature signatures
Recommendations
Efficient Filter Approximation Using the Earth Mover's Distance in Very Large Multimedia Databases with Feature Signatures
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementThe Earth Mover's Distance, proposed in computer vision as a distance-based similarity model reflecting the human perceptual similarity, has been widely utilized in numerous domains for similarity search applicable on both feature histograms and ...
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataThe Earth Mover's Distance (EMD) was developed in computer vision as a flexible similarity model that utilizes similarities in feature space to define a high quality similarity measure in feature representation space. It has been successfully adopted in ...
On Efficient Query Processing with the Earth Mover's Distance
PIKM '14: Proceedings of the 7th Workshop on Ph.D StudentsThe Earth Mover's Distance which is proposed in computer vision as a distance-based similarity model has been widely used and investigated in various domains for similarity search. Although there exists the opportunity to apply this well-known ...
Comments