Skip to main content
Log in

A locality-aware similar information searching scheme

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Intell. Inf. Syst. 3(3–4), 231–262 (1994)

    Article  Google Scholar 

  2. Maio, D., Maltoni, D.: A structural approach to fingerprint classification. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (1996)

  3. Lu, X., Wang, Y., Jain, A.K.: Combining classifiers for face recognition. In: Proceedings of the International Conference on Multimedia and Expo (ICME), vol. 3 (2003)

  4. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

  5. Kelley, John L.: General Topology. Springer, New York (1975)

    MATH  Google Scholar 

  6. Köppen, M.: The curse of dimensionality. In: Proceedings of 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), pp. 4–8 (2000)

  7. Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory (ICDT) (1999)

  8. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB (1999)

  9. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

  10. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of International Conference on Management of Data, pp. 47–57. ACM (1984)

  11. Fu, A., Chan, P.M.S., Cheung, Y.L., Moon, Y.S.: Dynamic vp-tree indexing for \(n\)-nearest neighbor search given pair-wise distances. VLDB 9(2), 154–173 (2000)

  12. Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)

  13. Li, C., Chang, E., Garcia, H., Wiederhold, G.: Clustering for approximate similarity search in high-dimensional spaces. TKDE 14(4), 792–808 (2002)

    Google Scholar 

  14. Patella, M., Ciaccia, P.: The many facets of approximate similarity search. In: Proceedings of the First International Workshop on Similarity Search and Applications (SISAP), (2008)

  15. Santini, S., Jain, R.: Beyond query by example. In: Proceedings of the Sixth ACM International Conference on Multimedia (Multimedia) (1998)

  16. Bartholdi, J.J. III, Goldsman, P.: Vertex-labeling algorithms for the Hilbert spacingfilling curve. Softw. Pract. Exp. 31(5), 395–408 (2001)

  17. Sagan, H.: Space-Filling Curves. Springer, New York (1994)

    Book  MATH  Google Scholar 

  18. Aggarwal, C.C.: Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proceedings of ACM SIGMOD Conference (2002)

  19. Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD) (2003)

  20. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  21. Linial, N., Sasson, O.: Non-expansive hashing. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (STOC) (1996)

  22. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 13th Annual ACM Symposiumon Theory of Computing (1998)

  23. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions, In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG) (2004)

  24. Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17(5), 419–428 (2001)

    Article  Google Scholar 

  25. Kulkarni, S., Orlandic, R.: High-Dimensional Similarity Search Using Data Sensitive Space Partitioning, vol. 4080. Springer, Berlin (2006)

    Google Scholar 

  26. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of SIGMOD, pp. 322–331 (1990)

  27. Kamel, I., Faloutsos, C.: Hilbert r-tree: an improved r-tree using fractals. In: Proceedings of VLDB, pp. 500–509. Morgan Kaufmann, San Francisco (1994)

  28. White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: Proceedings of the Twelfth International Conference on Data Engineering (ICDE) (1996)

  29. Digout, C.: Metric techniques for high-dimensional indexing. Technical Report TR 04–19, University of Alberta, Canada (2004)

  30. Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (1997)

  31. Ciaccia, P., Patella, M., Zezula, P.: M-trees: an efficient access method for similarity search in metric space. In: Proceedings of the 23rd International Conference on Very Large Data Bases (1997)

  32. Marschner, C.: Mtree tester applet. http://www.cmarschner.net/mtree.html

  33. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB) (1998)

  34. Digout, C., Nascimento, M.A.: High-dimensional similarity searches using a metric pseudo-grid. In: Proceedings of the 21st International Conference on Data Engineering Workshops (ICDEW) (2005)

  35. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Proceedings of NIPS (2008)

  36. Zou, F., Liu, C., Ling, H., Feng, H., Yan, L., Li, D.: Least square regularized spectral hashing for similarity search. Signal Process. 93(8), 2265–2273 (2013)

    Article  Google Scholar 

  37. Barthholdi, J.J. III, Platzman., L.K.: Heuristics based on spacefilling curves for combinatorial problems in euclidean space. Manag. Sci. 34(3), 291–305 (1988)

  38. Liao, S., Lopez, M., Leutenegger, S.: High dimensional similarity search with space filling curves. In: Proceedings of ICDE (2001)

  39. Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the Hilbert space-filling curve. TKDE 13(1), 124–141 (2001)

    Google Scholar 

  40. Mokbel, M.F., Aref, W.G.: Irregularity in multi-dimensional space-filling curves with applications in multimedia databases. In: Proceedings of CIKM, pp. 512–519 (2001)

  41. Castro, J., Georgiopoulos, M., Demara, R., Gonzalez, A.: Data-partitioning using the Hilbert space filling curves: effect on the speed of convergence of fuzzy artmap for large database problems. Neural Netw. 18(7), 967–984 (2005)

  42. Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. Int. J. Geogr. Inf. Sci. 4(1), 21–31 (1990)

  43. Jagadish, H.V.: Linear clustering of objects with multiple attributes. SIGMOD Rec. 19(2), 332–342 (1990)

  44. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Technical report, Cornell University, Ithaca (1974)

  45. Grossman, D., Frieder, O.: Information Retrieval: Algorithm and Heuristics. Springer, Netherlands (2004)

    Book  MATH  Google Scholar 

  46. Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. ACM Trans. Database Syst. 23(4), 453–490 (1998)

    Article  Google Scholar 

  47. Lucene, A. http://lucene.apache.org/core/4_9_0/index.html

  48. Thomee, B., Huiskes, Mark J., Lew, Michael S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Proceedings of MIR (2010)

Download references

Acknowledgments

This research was supported in part by US NSF Grants IIS-1354123, CNS-1254006, CNS-1249603, CNS-1049947, CNS-0917056 and CNS-1025652, and Microsoft Research Faculty Fellowship 8300751.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haiying Shen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, T., Lin, Y. & Shen, H. A locality-aware similar information searching scheme. Int J Digit Libr 17, 79–93 (2016). https://doi.org/10.1007/s00799-014-0128-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-014-0128-9

Keywords

Navigation