Abstract
In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.



























Similar content being viewed by others
References
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Intell. Inf. Syst. 3(3–4), 231–262 (1994)
Maio, D., Maltoni, D.: A structural approach to fingerprint classification. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (1996)
Lu, X., Wang, Y., Jain, A.K.: Combining classifiers for face recognition. In: Proceedings of the International Conference on Multimedia and Expo (ICME), vol. 3 (2003)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Kelley, John L.: General Topology. Springer, New York (1975)
Köppen, M.: The curse of dimensionality. In: Proceedings of 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), pp. 4–8 (2000)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory (ICDT) (1999)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB (1999)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of International Conference on Management of Data, pp. 47–57. ACM (1984)
Fu, A., Chan, P.M.S., Cheung, Y.L., Moon, Y.S.: Dynamic vp-tree indexing for \(n\)-nearest neighbor search given pair-wise distances. VLDB 9(2), 154–173 (2000)
Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)
Li, C., Chang, E., Garcia, H., Wiederhold, G.: Clustering for approximate similarity search in high-dimensional spaces. TKDE 14(4), 792–808 (2002)
Patella, M., Ciaccia, P.: The many facets of approximate similarity search. In: Proceedings of the First International Workshop on Similarity Search and Applications (SISAP), (2008)
Santini, S., Jain, R.: Beyond query by example. In: Proceedings of the Sixth ACM International Conference on Multimedia (Multimedia) (1998)
Bartholdi, J.J. III, Goldsman, P.: Vertex-labeling algorithms for the Hilbert spacingfilling curve. Softw. Pract. Exp. 31(5), 395–408 (2001)
Sagan, H.: Space-Filling Curves. Springer, New York (1994)
Aggarwal, C.C.: Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proceedings of ACM SIGMOD Conference (2002)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD) (2003)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Linial, N., Sasson, O.: Non-expansive hashing. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (STOC) (1996)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 13th Annual ACM Symposiumon Theory of Computing (1998)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions, In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG) (2004)
Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17(5), 419–428 (2001)
Kulkarni, S., Orlandic, R.: High-Dimensional Similarity Search Using Data Sensitive Space Partitioning, vol. 4080. Springer, Berlin (2006)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of SIGMOD, pp. 322–331 (1990)
Kamel, I., Faloutsos, C.: Hilbert r-tree: an improved r-tree using fractals. In: Proceedings of VLDB, pp. 500–509. Morgan Kaufmann, San Francisco (1994)
White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: Proceedings of the Twelfth International Conference on Data Engineering (ICDE) (1996)
Digout, C.: Metric techniques for high-dimensional indexing. Technical Report TR 04–19, University of Alberta, Canada (2004)
Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (1997)
Ciaccia, P., Patella, M., Zezula, P.: M-trees: an efficient access method for similarity search in metric space. In: Proceedings of the 23rd International Conference on Very Large Data Bases (1997)
Marschner, C.: Mtree tester applet. http://www.cmarschner.net/mtree.html
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB) (1998)
Digout, C., Nascimento, M.A.: High-dimensional similarity searches using a metric pseudo-grid. In: Proceedings of the 21st International Conference on Data Engineering Workshops (ICDEW) (2005)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Proceedings of NIPS (2008)
Zou, F., Liu, C., Ling, H., Feng, H., Yan, L., Li, D.: Least square regularized spectral hashing for similarity search. Signal Process. 93(8), 2265–2273 (2013)
Barthholdi, J.J. III, Platzman., L.K.: Heuristics based on spacefilling curves for combinatorial problems in euclidean space. Manag. Sci. 34(3), 291–305 (1988)
Liao, S., Lopez, M., Leutenegger, S.: High dimensional similarity search with space filling curves. In: Proceedings of ICDE (2001)
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the Hilbert space-filling curve. TKDE 13(1), 124–141 (2001)
Mokbel, M.F., Aref, W.G.: Irregularity in multi-dimensional space-filling curves with applications in multimedia databases. In: Proceedings of CIKM, pp. 512–519 (2001)
Castro, J., Georgiopoulos, M., Demara, R., Gonzalez, A.: Data-partitioning using the Hilbert space filling curves: effect on the speed of convergence of fuzzy artmap for large database problems. Neural Netw. 18(7), 967–984 (2005)
Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. Int. J. Geogr. Inf. Sci. 4(1), 21–31 (1990)
Jagadish, H.V.: Linear clustering of objects with multiple attributes. SIGMOD Rec. 19(2), 332–342 (1990)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Technical report, Cornell University, Ithaca (1974)
Grossman, D., Frieder, O.: Information Retrieval: Algorithm and Heuristics. Springer, Netherlands (2004)
Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. ACM Trans. Database Syst. 23(4), 453–490 (1998)
Thomee, B., Huiskes, Mark J., Lew, Michael S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Proceedings of MIR (2010)
Acknowledgments
This research was supported in part by US NSF Grants IIS-1354123, CNS-1254006, CNS-1249603, CNS-1049947, CNS-0917056 and CNS-1025652, and Microsoft Research Faculty Fellowship 8300751.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, T., Lin, Y. & Shen, H. A locality-aware similar information searching scheme. Int J Digit Libr 17, 79–93 (2016). https://doi.org/10.1007/s00799-014-0128-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-014-0128-9