Abstract
Top-k (preference) queries are used in several domains to retrieve the set of \(k\) tuples that more closely match a given query. For high-dimensional spaces, evaluation of top-k queries is expensive, as data and space partitioning indices perform worse than sequential scan. An alternative approach is the use of sorted lists to speed up query evaluation. This approach extends performance gains when compared to sequential scan to about ten dimensions. However, data-sets for which preference queries are considered, often are high-dimensional. In this paper, we explore the the use of bit-sliced indices (BSI) to encode the attributes or score lists and perform top-k queries over high-dimensional data using bit-wise operations. Our approach does not require sorting or random access to the index. Additionally, bit-sliced indices require less space than other type of indices. The size of the bit-sliced index (without using compression) for a normalized data-set with 3 decimals is 60 times smaller than the size of sorted lists. Furthermore, our experimental evaluation shows that the use of BSI for top-k query processing is more efficient than Sequential Scan for high-dimensional data. When compared to Sequential Top-k Algorithm (STA), BSI is one order of magnitude faster.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 11:1–11:58 (2008). doi:10.1145/1391729.1391730. http://doi.acm.org/10.1145/1391729.1391730
Pagani, M.: Encyclopedia of Multimedia Technology and Networking, 2nd edn., Information Science Reference - Imprint of: IGI Publishing, Hershey (2008)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001). doi:10:1145=502807:502809. http://doi.acm.org/10.1145/502807.502809
Daoudi, I., Ouatik, S.E., Kharraz, A.E., Idrissi, K., Aboutajdine, D.: Vector approximation based indexing for high-dimensional multimedia databases (2008)
Chaudhuri, S., Gravano, L., Marian, A.: Optimizing top-k selection queries over multimedia repositories. IEEE Trans. on Knowl. and Data Eng. 16(8), 992–1009 (2004). doi:10.1109/TKDE.2004.30. http://dx.doi.org/10.1109/TKDE.2004.30
Fagin, R.: Combining fuzzy information from multiple systems (extended abstract). In: Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1996, pp. 216–226. ACM, New York (1996). doi:10.1145/237661.237715. http://doi.acm.org/10.1145/237661.237715
Long, X., Suel, T.: Optimized query execution in large search engines with global page ordering. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, VLDB Endowment, vol. 29, pp. 129–140 (2003)
Persin, M., Zobel, J., Sacks-davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47, 749–764 (1996)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, pp. 206–215. ACM, New York (2004). doi:10.1145/1011767.1011798. http://doi.acm.org/10.1145/1011767.1011798
Wu, M., Xu, J., Tang, X., Lee, W.-C.: Top-k monitoring in wireless sensor networks. IEEE Trans. on Knowl. and Data Eng. 19(7), 962–976 (2007). doi:10.1109/TKDE.2007.1038
Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top-k retrieval in peer-to-peer networks. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 174–185. IEEE Computer Society, Washington, DC (2005). doi:10.1109/ICDE.2005.115. http://dx.doi.org/10.1109/ICDE.2005.115
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solutionfor computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006). doi:10.1145/1166074.1166084. http://doi.acm.org/10.1145/1166074.1166084
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004). doi:10.1145/1005566.1005569. http://doi.acm.org/10.1145/1005566.1005569
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: PODS, pp. 102–113 (2001)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, VLDB Endowment, pp. 495–506. http://dl.acm.org/citation.cfm?id=1325851.1325909
Yu, A., Agarwal, P.K., Yang, J.: Topk preferences in high dimensions (2014)
Gurský, P., Vojtáš, P.: Speeding up the nra algorithm. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 243–255. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-87993-0_20
Mamoulis, N., Cheng, K.H., Yiu, M.L., Cheung, D.W.: Efficient aggregation of ranked inputs. In: ICDE. IEEE Computer Society, p. 72 (2006)
Natsev, A., Chang, Y.C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290 (2001)
Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases, pp. 419–428 (2000)
Jin, W., Patel, J.M.: Efficient and generic evaluation of ranked queries. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 601–612. ACM, New York (2011). doi:10.1145/1989323.1989386. http://doi.acm.org/10.1145/1989323.1989386
O’Neil, P., Quass, D.: Improved query performance with variant indexes. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 38–49. ACM Press (1997). http://doi.acm.org/10.1145/253260.253268
Rinfret, D., O’Neil, P., O’Neil, E.: Bit-sliced index arithmetic. SIGMOD Rec. 30(2), 47–57 (2001). doi:http://doi.acm.org/10.1145/376284.375669
Wu, M.-C., Buchmann, A.P.: Encoded bitmap indexing for data warehouses. In: ICDE 1998: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 220–230. IEEE Computer Society Washington, DC (1998)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics. Philadelphia (2003). http://dl.acm.org/citation.cfm?id=644108.644113
Pang, H., Ding, X., Zheng, B.: Efficient processing of exact top-k queries over disk-resident sorted lists. The VLDB Journal 19(3), 437–456 (2010). doi:10:1007=s00778–009–0174–x. http://dx.doi.org/10.1007/s00778-009-0174-x
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing, In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 475–486 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164169
Gurský, P., Vojtáš, P.: On Top-k search with no random access using small memory. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds.) ADBIS 2008. LNCS, vol. 5207, pp. 97–111. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-85713-6_8
Chuan Chang, K.C., won Hwang, S.: Minimal probing: Supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)
Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, VLDB Endowment, pp. 451–462 (2006). http://dl.acm.org/citation.cfm?id=1182635.1164167
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data (2009). doi:10.1137/ 070710111. http://dx.doi.org/10.1137/070710111
Pareto, V.: Manual of political economy (1906)
lászló Barabáasi, A., Albert, R.: Emergence of scaling in random networks, Science
Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207 (2005). http://www.citebase.org/abstract?id=oai:arXiv.org:cond-mat/0505371
Zipf, G.: Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge (1949). http://publication.wilsonwong.me/load.php?id=233281783
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Guzun, G., Tosado, J., Canahuate, G. (2014). Slicing the Dimensionality: Top-k Query Processing for High-Dimensional Spaces. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-45714-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45713-9
Online ISBN: 978-3-662-45714-6
eBook Packages: Computer ScienceComputer Science (R0)