Skip to main content

ISIS: A New Approach for Efficient Similarity Search in Sparse Databases

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5982))

Included in the following conference series:

Abstract

High-dimensional sparse data is prevalent in many real-life applications. In this paper, we propose a novel index structure for accelerating similarity search in high-dimensional sparse databases, named ISIS, which stands for Indexing Sparse databases using Inverted fileS. ISIS clusters a dataset and converts the original high-dimensional space into a new space where each dimension represents a cluster; furthermore, the key values in the new space are used by Inverted-files indexes. We also propose an extension of ISIS, named ISIS + , which partitions the data space into lower dimensional subspaces and clusters the data within each subspace. Extensive experimental study demonstrates the superiority of our approaches in high-dimensional sparse databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. ACM SIGMOD Conference, pp. 94–105 (1998)

    Google Scholar 

  2. Agrawal, R., Somani, A., Xu, Y.: Storage and querying of e-commerce data. In: Proc. 27th VLDB Conference, pp. 149–158 (2001)

    Google Scholar 

  3. Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proc. of ICDE Conference, pp. 327–336 (2008)

    Google Scholar 

  4. Beckmann, J.L., Halverson, A., Krishnamurthy, R., Naughton, J.F.: Extending rdbmss to support sparse datasets using an interpreted attribute storage format. In: Proc. 22nd ICDE Conference, p. 58 (2006)

    Google Scholar 

  5. Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  6. Cui, B., Ooi, B.C., Su, J.W., Tan, K.L.: Contorting high dimensional data for efficient main memory processing. In: Proc. ACM SIGMOD Conference, pp. 479–490 (2003)

    Google Scholar 

  7. Hartigan, J., Wong, M.: A K-means clustering algorithm. Applied Statistics 28(1), 100–108 (1979)

    Article  MATH  Google Scholar 

  8. Hui, J., Ooi, B.C., Shen, H., Yu, C., Zhou, A.: An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing. In: Proc. 19th ICDE Conference, p. 87 (2003)

    Google Scholar 

  9. Koudas, N., Ooi, B.C., Shen, H.T., Tung, A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In: Proc. 20th ICDE Conference, pp. 6–17 (2004)

    Google Scholar 

  10. Li, C., Chang, E.Y., Garcia-Molina, H., Wiederhold, G.: Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 14(4), 792–808 (2002)

    Article  Google Scholar 

  11. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Information Systems 14(4), 349–379 (1996)

    Article  Google Scholar 

  12. Tao, Y., Ye, K., Sheng, C., Kalnis, P.: Quality and efficiency in high-dimensional nearest neighbor search. In: Proc. ACM SIGMOD Conference, pp. 563–576 (2009)

    Google Scholar 

  13. Wang, C., Wang, X.S.: Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches. VLDB J. 9(4), 344–361 (2001)

    MATH  Google Scholar 

  14. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proc. 24th VLDB Conference, pp. 194–205 (1998)

    Google Scholar 

  15. Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: An efficient method to KNN processing. In: Proc. 27th VLDB Conference, pp. 421–430 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cui, B., Zhao, J., Cong, G. (2010). ISIS: A New Approach for Efficient Similarity Search in Sparse Databases. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 5982. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12098-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12098-5_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12097-8

  • Online ISBN: 978-3-642-12098-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics