skip to main content
10.1145/2791347.2791384acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

Efficient similarity search in scientific databases with feature signatures

Published:29 June 2015Publication History

ABSTRACT

The recent rapid growth of scientific data necessitates efficient similarity search techniques for which convenient object representation models are of vital importance. Feature signatures denoting highly flexible object feature representations have increasingly gained attention for which corresponding efficiency improvement techniques are developed. In this paper, we focus on efficient query processing with the well-known Earth Mover's Distance (EMD) on databases of feature signatures, and propose efficient approximation techniques successfully applicable to high-dimensional feature signatures via dimensionality reduction, guaranteeing both completeness and no false-dismissal within a filter-and-refine architecture. Rigorous experiments on real world data indicate a considerable reduction in the number of EMD computations and high efficiency of the proposed techniques which significantly reduce the query processing time.

References

  1. A. Andoni, P. Indyk, and R. Krauthgamer. Earth mover distance over high-dimensional spaces. SODA, pages 343--352, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Armiti and M. Gertz. Geometric graph matching and similarity: A probabilistic approach. SSDBM '14, pages 27:1--27:12, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Assent, A. Wenning, and T. Seidl. Approximation techniques for indexing the earth mover's distance in multimedia databases. In ICDE, page 11, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Assent, M. Wichterich, T. Meisen, and T. Seidl. Efficient similarity search using the earth mover's distance for large multimedia databases. In ICDE, pages 307--316, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Assent, M. Wichterich, and T. Seidl. Adaptable distance functions for similarity-based multimedia retrieval. Datenbank-Spektrum, 6(19):23--31, 2006.Google ScholarGoogle Scholar
  6. C. Beecks. Distance-based similarity models for content-based multimedia retrieval. PhD thesis, RWTH Aachen University, 2013.Google ScholarGoogle Scholar
  7. C. Beecks, M. S. Uysal, and T. Seidl. Signature quadratic form distance. In CIVR, p. 438--445, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33:322--373, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. S. Chavez and T. F. Heatherton. Representational similarity of social and valence information in the medial pfc. J. Cogn. Neuroscience, 27(1):73--82, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cheng, L. Chen, J. Chen, and X. Xie. Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In EDBT 2009, pages 672--683, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255, June 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD, 23(2):419--429, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Fenz, D. Lange, A. Rheinländer, F. Naumann, and U. Leser. Efficient similarity search in very large string sets. In SSDBM 2012, pages 262--279, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Hillier and G. Lieberman. Introduction to Linear Programming. McGraw-Hill, 1990.Google ScholarGoogle Scholar
  15. A. Hinneburg and W. Lehner. Database support for 3d-protein data set analysis. In SSDBM, pages 161--170, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. E. Houle, X. Ma, M. Nett, and V. Oria. Dimensional testing for multi-step similarity search. In ICDM, pages 299--308, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Korn, N. Sidiropoulos, C. Faloutsos, E. L. Siegel, and Z. P. Fast nearest neighbor search in medical image databases. In VLDB, pages 215--226, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, pages 2161--2168, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Nutanong, N. Carey, Y. Ahmad, A. S. Szalay, and T. B. Woolf. Adaptive exploration for large-scale protein analysis in the molecular dynamics database. In SSDBM, pages 45:1--45:4, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. Journal of Computer Vision, 40(2):99--121, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. E. Ruttenberg and A. K. Singh. Indexing the earth mover's distance using normal distributions. PVLDB, 5(3):205--216, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Seidl and H.-P. Kriegel. Optimal multi-step k-nearest neighbor search. In SIGMOD, pages 154--165, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Strötgen, M. Gertz, and C. Junghans. An event-centric model for multilingual document similarity. In ACM SIGIR, pages 953--962, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Tamura, S. Mori, and T. Yamawaki. Textural features corresponding to visual perception. TSMC, 8(6):460--473, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. Earth mover's distance based similarity search at scale. PVLDB, 7(4):313--324, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. S. Uysal, C. Beecks, J. Schmücking, and T. Seidl. Efficient filter approximation using the Earth Mover's Distance in very large multimedia databases with feature signatures. In CIKM, pages 979--988, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. S. Uysal, C. Beecks, and T. Seidl. On efficient query processing with the earth mover's distance. In PIKM@CIKM, pages 25--32, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Wichterich, I. Assent, P. Kranen, and T. Seidl. Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In SIGMOD, pages 199--212, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Xu, Z. Zhang, A. K. H. Tung, and G. Yu. Efficient and effective similarity search over probabilistic data based on earth mover's distance. PVLDB, 3(1):758--769, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity Search - The Metric Space Approach, volume 32 of Advances in Database Systems. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient similarity search in scientific databases with feature signatures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database Management
          June 2015
          390 pages
          ISBN:9781450337090
          DOI:10.1145/2791347

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 June 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate56of146submissions,38%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader