Abstract
Efficient retrieval of the most relevant (e.g. top-k, k-NN) tuples is an important requirement in information systems which access large amounts of data. Top-k (or k-nearest-neighbors) queries retrieve the k-objects which score best for a specified objective function. But retrieving the closest objects does not tell the user how close or similar the objects are to the ideal object described by the input query. To support the query issuer more appropriate we introduce the top-q query answering TQQA which does not return a fixed number of result tuples but all tuples that are similar to the searched optimum with at least some minimum degree q. We show how to combine top-q queries with top-k queries enabling the user to post a large number of interesting queries. To the best of our knowledge neither such a top-q query answering approach nor a combination with top-k has not been proposed before. We implemented our approach and evaluated it against the best position algorithm BPA-2 which proved to be the among the fastest threshold based top-k query answering approaches. Our experiments showed an improvement by one to two orders of magnitude regarding time and memory requirements.
The work reported here was supported by the Austrian Ministry of Science and Research within the project GATIB II and BBMRI.AT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, S., Chaudhuri, S.: Automated ranking of database query results. In: CIDR, pp. 888–899 (2003)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 495–506. VLDB Endowment (2007)
Asslaber, M., Abuja, P., et al.: The genome austria tissue bank (gatib). Pathobiology 74, 251–258 (2007)
Church, K., Gale, W.: Inverse document frequency (idf): a measure of deviations from poisson. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol. 11, pp. 283–295. Springer, Dordrecht (1999)
Dabringer, C., Eder, J.: Efficient top-k retrieval for user preference queries. In: Proceedings of the 26th ACM Symposium on Applied Computing (2011)
Dabringer, C., Eder, J.: Fast top-k query answering. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (2011)
Dabringer, C., Eder, J.: Towards adaptive distributed Top-k query processing. In: Ivanović, M., et al. (eds.) ADBIS 2016. CCIS, vol. 637, pp. 37–44. Springer, Cham (2016). doi:10.1007/978-3-319-44066-8_4
Eder, J., Dabringer, C., Schicho, M., Stark, K.: Information systems for federated biobanks. In: Hameurlain, A., Küng, J., Wagner, R. (eds.) Transactions on Large-Scale Data- and Knowledge-Centered Systems I. LNCS, vol. 5740, pp. 156–190. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03722-1_7
Eder, J., Frank, H., Liebhart, W.: Optimization of object-oriented queries by inverse methods. In: Eder, J., Kalinichenko, L.A. (eds.) East/West Database Workshop. Workshops in Computing, pp. 109–121. Springer, London (1995)
Eder, J., Gottweis, H., Zatloukal, K.: It solutions for privacy protection in biobanking. Public Health Genomics 15(5), 254–262 (2012)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM Symposium on Principles of Database Systems, pp. 102–113. ACM, New York (2001)
Guntzer, U., Balke, W.-T., Kiessling, W.: Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Databases, pp. 419–428. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Guntzer, U., Balke, W.-T., Kiessling, W., Guntzer, U., Balke, W.-T., Kiessling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE International Conference on IT: Coding and Computing, pp. 622–628 (2001)
Hofer-Picout, P., Pichler, H., Eder, J., Neururer, S.B., Müller, H., Reihs, R., Holub, P., Insam, T., Goebel, G.: Conception and implementation of an Austrian biobank directory integration framework. Biopreservation Biobanking 15(4), 332-340 (2017)
Hristidis, V., Hu, Y., Ipeirotis, P.G.: Ranked queries over sources with boolean query interfaces without ranking support. In: 26th IEEE International Conference on Data Engineering (2010)
Hua, M., Pei, J., Fu, A.W.C., Lin, X., Leung, H.-F.: Efficiently answering top-k typicality queries on large databases. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 890–901. VLDB Endowment (2007)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Lesot, M., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data. Int. J. Knowl. Eng. Soft Data Paradigm. 1, 63–84 (2009)
Levandoski, J.J., Mokbel, M.F., Khalefa, M.E., Korukanti, V.R.: Flexpref: a framework for extensible preference evaluation in database systems. In: ICDE, New York, NY, USA (2010)
Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. ACM Trans. Database Syst. 32(3), 19 (2007)
Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)
Nepal, S., Ramakrishna, M.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Robertson, S.: Understanding inverse document frequency: on theoretical arguments for idf. J. Documentation 60, 503–520 (2004)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 132–142 (1988)
Wichmann, H.-E., Kuhn, K., et al.: Comprehensive catalog of European biobanks. Nat. Biotechnol. 29(9), 795–797 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dabringer, C., Eder, J. (2017). Fast Top-Q and Top-K Query Answering. In: Dang, T., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds) Future Data and Security Engineering. FDSE 2017. Lecture Notes in Computer Science(), vol 10646. Springer, Cham. https://doi.org/10.1007/978-3-319-70004-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-70004-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70003-8
Online ISBN: 978-3-319-70004-5
eBook Packages: Computer ScienceComputer Science (R0)