Abstract
Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using AspectJ. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time.





























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 181–292 (1999)
Acar, U.A., Ahmed, A., Blume, M.: Imperative selfadjusting computation. In: Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages (2008)
Acar, U.A., Blelloch, G.E., Blume, M., Harper, R., Tangwongsan, K.: An experimental analysis of self-adjusting computation. ACM Trans. Prog. Lang. Sys. (2009)
Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: Proceedings of the 9th International Conference on Data Engineering, pp. 538–547 (1993)
Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. VLDB J. 5(4), 229–337 (1996)
Babu, S., Munagala, K., Widom, J., Motwani, R.: Adaptive caching for continuous queries. In: Proceedings of 21st International Conference on Data Engineering (2005)
Bizarro, P., Bruno, N., DeWitt, D.J.: Progressive parametric query optimization. IEEE Trans. Knowl. Data Eng. 21, 582–594 (2009)
Cao, P., Irani, S.: Cost-aware WWW proxy caching algorithms. In: Proceedings of the 1997 USENIX Symposium on Internet Technology and Systems, pp. 193–206 (1997)
Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43 (1998)
Chen, Y., Byna, S., Sun, X.: Data access history cache and associated data prefetching mechanisms. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007)
Chidlovskii, B., Borghoff, U.M.: Semantic caching of web queries. VLDB J. (2000)
Chu, F., Halpern, J., Seshadri, P.: Least expected cost query optimization: an exercise in utility. In: Proceedings of the ACM Symposium on the Principles of Database Systems (1999)
Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 150–160 (1994)
Cole, R.L.: A decision theoretic cost model for dynamic plans. IEEE Data Eng. Bull. (2000)
Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proceedings of the 22nd International VLDB Conference (1996)
Degenaro, L., Iyengar, A., Lipkind, I., Rouvellou, I.: A middleware system which intelligently caches query results. In: IFIP/ACM International Conference on Distributed Systems Platforms (2000)
Deshpande, P., Ramasamy, K., Shukla, A., Naughton, J.: Caching multidimensional queries using chunks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, vol. 27, Issue 2 (1998)
Fetterly, D.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the LSDS-IR. CEUR Workshop, vol. 80, ISSN 1613-0073, p. 8 (2009)
Fu, Y.: A Self-Managed Predicate-Based Cache. Faculty of Computer Science. Technical Report, Dalhousie University, Halifax (2005)
Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: Proceedings of the 2001 ACM SIGMOD Conference on Management of Data, pp. 461–472 (2001)
Gibbons, P., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27, 261–298 (2002)
Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)
Hellerstein, J., Naughton, J.: Query execution techniques for caching expensive methods. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 423–434 (1996)
Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28, 121–123 (1996)
Ioannidis, Y.E., Raymond, N., Shim, K., Sellis, T.K.: Parametric query optimization. In: Proceedings of the 18th International Conference on Very Large Databases (VLDB), pp. 103–114 (1992)
Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Rec. 27, 106–117 (1998)
Keller, A., Basu, J.: A predicate-based caching scheme for client-server database architectures. VLDB J. 5, 35–47 (1996)
Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25, 43–82 (2000)
Labio, W., Yang, J., Cui, Y., Garcia-Molina, H., Widom, J.: Performance issues in incremental warehouse maintenance. In: Proceedings of the 2000 International Conference on Very Large Data Bases, pp. 461–472 (2000)
Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: Proceedings of the 12th International Conference on World Wide Web (2003)
Lencevicius, R., Holzle, U., Singh, A.K.: Query-based debugging of object-oriented programs. In: Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 304–317 (1997)
Meijer, E., Beckman, B., Bierman, G.: LINQ: reconciling object, relations and XML in the .NET framework. SIGMOD (2006)
Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialize view selection and maintenance using multi-query optimization. In: Proceedings of ACM SIGMOD (2001)
Nelson, M.: Robocode (2012). Accessed 6 September 2012. http://robocode.sourceforge.net
Nerella, V., Surapaneni, S., Madria, S., Weigert, T.: Exploring query optimization in programming codes by reducing run-time execution. In: IEEE 34th Annual Computer Software and Applications Conference, pp. 407–412 (2010)
Nerella, V., Madria, S., Weigert, T.: Performance improvement for collection operations using join query optimization. In: IEEE 35th Annual Computer Software and Applications Conference, pp. 468–471 (2011)
Ozcan, R., Altingovde, I.S., Ulusoy, O.: Static query result caching revisited. In: Proceedings of the 17th International Conference on World Wide Web (2008)
Ozcan, R., Altingovde, I.S., Ulusoy, O.: Cost-aware strategies for query result caching in web search engines. ACM Trans. Web (2011)
Pythondocs: Python List comprehensions. Accessed 6 September 2012 (2012). http://docs.python.org/tutorial/datastructures.html
Qian, X.: Query folding. In: Proceedings of the 12th International Conference on Data Engineering, pp. 48–55 (1996)
Quass, D., Gupta, A., Mumick, I., Widom, J.: Making views self-maintainable for data warehousing. In: Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems, pp. 158–169 (1996)
Ross, K., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: trading space for time. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 447–458 (1996)
Roussopoulos, N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 258–290 (1982)
Roussopoulos, N.: An incremental access method for view-cache: concept, algorithms, and cost analysis. ACM Trans. Database Syst. 16(3), 535–563 (1991)
Seppi, K.D., Barnes, J.W., Morris, C.N.: A Bayesian approach to database query optimization. ORSA Journal on Computing, 410–419 (1993)
Serpanos, D., Karakostas, G., Wolf, W.: Effective caching of web objects using Zipf’s law. In: IEEE International Conference on Multimedia and Expo, vol. 2, pp. 727–730 (2000)
Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6, 191–208 (1997)
Surapaneni, S., Nerella, V., Madria, S., Weigert, T.: Exploring caching for efficient collection operations. In: IEEE/ACM 26th Automated Software Engineering (ASE) Conference, pp. 468–471 (2011)
Willis, D.: The Java Query Language. Master of Science Thesis, Victoria University of Wellington (2008)
Willis, D., Pearce, D.J., Noble, J.: Efficient object querying in Java. In: Proceedings of the European Conference on Object-Oriented Programming (ECOOP) (2006)
Willis, D., Pearce, D.J., Noble, J.: Caching and incrementalization in the Java query language. In: Proceedings of the 2008 ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pp. 1–18 (2008)
Zhou, J., Larson, P., Goldstein, J., Ding, L.: Dynamic materialized views. In: Proceedings of International Conference on Data Engineering, pp. 526–535 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nerella, V.K.S., Surapaneni, S., Madria, S.K. et al. Exploring optimization and caching for efficient collection operations. Autom Softw Eng 21, 3–40 (2014). https://doi.org/10.1007/s10515-013-0119-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-013-0119-x