A locality-aware similar information searching scheme

Li, Ting; Lin, Yuhua; Shen, Haiying

doi:10.1007/s00799-014-0128-9

A locality-aware similar information searching scheme

Published: 12 October 2014

Volume 17, pages 79–93, (2016)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Ting Li¹,
Yuhua Lin² &
Haiying Shen²

1364 Accesses
3 Altmetric
Explore all metrics

Abstract

In a database, a similar information search means finding data records which contain the majority of search keywords. Due to the rapid accumulation of information nowadays, the size of databases has increased dramatically. An efficient information searching scheme can speed up information searching and retrieve all relevant records. This paper proposes a Hilbert curve-based similarity searching scheme (HCS). HCS considers a database to be a multidimensional space and each data record to be a point in the multidimensional space. By using a Hilbert space filling curve, each point is projected from a high-dimensional space to a low-dimensional space, so that the points close to each other in the high-dimensional space are gathered together in the low-dimensional space. Because the database is divided into many clusters of close points, a query is mapped to a certain cluster instead of searching the entire database. Experimental results prove that HCS dramatically reduces the search time latency and exhibits high effectiveness in retrieving similar information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

Shortening the Candidate List for Similarity Searching Using Inverted Index

Local Sensitive Hashing for Proximity Searching

References

Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Intell. Inf. Syst. 3(3–4), 231–262 (1994)
Article Google Scholar
Maio, D., Maltoni, D.: A structural approach to fingerprint classification. In: Proceedings of the International Conference on Pattern Recognition (ICPR) (1996)
Lu, X., Wang, Y., Jain, A.K.: Combining classifiers for face recognition. In: Proceedings of the International Conference on Multimedia and Expo (ICME), vol. 3 (2003)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Kelley, John L.: General Topology. Springer, New York (1975)
MATH Google Scholar
Köppen, M.: The curse of dimensionality. In: Proceedings of 5th Online World Conference on Soft Computing in Industrial Applications (WSC5), pp. 4–8 (2000)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory (ICDT) (1999)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of VLDB (1999)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of International Conference on Management of Data, pp. 47–57. ACM (1984)
Fu, A., Chan, P.M.S., Cheung, Y.L., Moon, Y.S.: Dynamic vp-tree indexing for $n$-nearest neighbor search given pair-wise distances. VLDB 9(2), 154–173 (2000)
Burkhard, W.A., Keller, R.M.: Some approaches to best-match file searching. Commun. ACM 16(4), 230–236 (1973)
Li, C., Chang, E., Garcia, H., Wiederhold, G.: Clustering for approximate similarity search in high-dimensional spaces. TKDE 14(4), 792–808 (2002)
Google Scholar
Patella, M., Ciaccia, P.: The many facets of approximate similarity search. In: Proceedings of the First International Workshop on Similarity Search and Applications (SISAP), (2008)
Santini, S., Jain, R.: Beyond query by example. In: Proceedings of the Sixth ACM International Conference on Multimedia (Multimedia) (1998)
Bartholdi, J.J. III, Goldsman, P.: Vertex-labeling algorithms for the Hilbert spacingfilling curve. Softw. Pract. Exp. 31(5), 395–408 (2001)
Sagan, H.: Space-Filling Curves. Springer, New York (1994)
Book MATH Google Scholar
Aggarwal, C.C.: Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proceedings of ACM SIGMOD Conference (2002)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD) (2003)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Linial, N., Sasson, O.: Non-expansive hashing. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing (STOC) (1996)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 13th Annual ACM Symposiumon Theory of Computing (1998)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions, In: Proceedings of the Twentieth Annual Symposium on Computational Geometry (SCG) (2004)
Buhler, J.: Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17(5), 419–428 (2001)
Article Google Scholar
Kulkarni, S., Orlandic, R.: High-Dimensional Similarity Search Using Data Sensitive Space Partitioning, vol. 4080. Springer, Berlin (2006)
Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of SIGMOD, pp. 322–331 (1990)
Kamel, I., Faloutsos, C.: Hilbert r-tree: an improved r-tree using fractals. In: Proceedings of VLDB, pp. 500–509. Morgan Kaufmann, San Francisco (1994)
White, D.A., Jain, R.: Similarity indexing with the ss-tree. In: Proceedings of the Twelfth International Conference on Data Engineering (ICDE) (1996)
Digout, C.: Metric techniques for high-dimensional indexing. Technical Report TR 04–19, University of Alberta, Canada (2004)
Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (1997)
Ciaccia, P., Patella, M., Zezula, P.: M-trees: an efficient access method for similarity search in metric space. In: Proceedings of the 23rd International Conference on Very Large Data Bases (1997)
Marschner, C.: Mtree tester applet. http://www.cmarschner.net/mtree.html
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB) (1998)
Digout, C., Nascimento, M.A.: High-dimensional similarity searches using a metric pseudo-grid. In: Proceedings of the 21st International Conference on Data Engineering Workshops (ICDEW) (2005)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Proceedings of NIPS (2008)
Zou, F., Liu, C., Ling, H., Feng, H., Yan, L., Li, D.: Least square regularized spectral hashing for similarity search. Signal Process. 93(8), 2265–2273 (2013)
Article Google Scholar
Barthholdi, J.J. III, Platzman., L.K.: Heuristics based on spacefilling curves for combinatorial problems in euclidean space. Manag. Sci. 34(3), 291–305 (1988)
Liao, S., Lopez, M., Leutenegger, S.: High dimensional similarity search with space filling curves. In: Proceedings of ICDE (2001)
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the Hilbert space-filling curve. TKDE 13(1), 124–141 (2001)
Google Scholar
Mokbel, M.F., Aref, W.G.: Irregularity in multi-dimensional space-filling curves with applications in multimedia databases. In: Proceedings of CIKM, pp. 512–519 (2001)
Castro, J., Georgiopoulos, M., Demara, R., Gonzalez, A.: Data-partitioning using the Hilbert space filling curves: effect on the speed of convergence of fuzzy artmap for large database problems. Neural Netw. 18(7), 967–984 (2005)
Abel, D.J., Mark, D.M.: A comparative analysis of some two-dimensional orderings. Int. J. Geogr. Inf. Sci. 4(1), 21–31 (1990)
Jagadish, H.V.: Linear clustering of objects with multiple attributes. SIGMOD Rec. 19(2), 332–342 (1990)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Technical report, Cornell University, Ithaca (1974)
Grossman, D., Frieder, O.: Information Retrieval: Algorithm and Heuristics. Springer, Netherlands (2004)
Book MATH Google Scholar
Zobel, J., Moffat, A., Ramamohanarao, K.: Inverted files versus signature files for text indexing. ACM Trans. Database Syst. 23(4), 453–490 (1998)
Article Google Scholar
Lucene, A. http://lucene.apache.org/core/4_9_0/index.html
Thomee, B., Huiskes, Mark J., Lew, Michael S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Proceedings of MIR (2010)

Download references

Acknowledgments

This research was supported in part by US NSF Grants IIS-1354123, CNS-1254006, CNS-1249603, CNS-1049947, CNS-0917056 and CNS-1025652, and Microsoft Research Faculty Fellowship 8300751.

Author information

Authors and Affiliations

Wal-mart Stores Inc, 702 SW 8th St, Bentonville, AR, 72716, USA
Ting Li
Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, 29634, USA
Yuhua Lin & Haiying Shen

Authors

Ting Li
View author publications
You can also search for this author inPubMed Google Scholar
Yuhua Lin
View author publications
You can also search for this author inPubMed Google Scholar
Haiying Shen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Haiying Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, T., Lin, Y. & Shen, H. A locality-aware similar information searching scheme. Int J Digit Libr 17, 79–93 (2016). https://doi.org/10.1007/s00799-014-0128-9

Download citation

Received: 05 June 2013
Revised: 03 September 2014
Accepted: 11 September 2014
Published: 12 October 2014
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00799-014-0128-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A locality-aware similar information searching scheme

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SELSH: A Hashing Scheme for Approximate Similarity Search with Early Stop Condition

Shortening the Candidate List for Similarity Searching Using Inverted Index

Local Sensitive Hashing for Proximity Searching

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now