Skip to main content
Log in

Exploring optimization and caching for efficient collection operations

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using AspectJ. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  • Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 181–292 (1999)

    Chapter  Google Scholar 

  • Acar, U.A., Ahmed, A., Blume, M.: Imperative selfadjusting computation. In: Proceedings of the 25th Annual ACM Symposium on Principles of Programming Languages (2008)

    Google Scholar 

  • Acar, U.A., Blelloch, G.E., Blume, M., Harper, R., Tangwongsan, K.: An experimental analysis of self-adjusting computation. ACM Trans. Prog. Lang. Sys. (2009)

  • Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: Proceedings of the 9th International Conference on Data Engineering, pp. 538–547 (1993)

    Chapter  Google Scholar 

  • Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. VLDB J. 5(4), 229–337 (1996)

    Article  Google Scholar 

  • Babu, S., Munagala, K., Widom, J., Motwani, R.: Adaptive caching for continuous queries. In: Proceedings of 21st International Conference on Data Engineering (2005)

    Google Scholar 

  • Bizarro, P., Bruno, N., DeWitt, D.J.: Progressive parametric query optimization. IEEE Trans. Knowl. Data Eng. 21, 582–594 (2009)

    Article  Google Scholar 

  • Cao, P., Irani, S.: Cost-aware WWW proxy caching algorithms. In: Proceedings of the 1997 USENIX Symposium on Internet Technology and Systems, pp. 193–206 (1997)

    Google Scholar 

  • Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43 (1998)

    Google Scholar 

  • Chen, Y., Byna, S., Sun, X.: Data access history cache and associated data prefetching mechanisms. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007)

    Google Scholar 

  • Chidlovskii, B., Borghoff, U.M.: Semantic caching of web queries. VLDB J. (2000)

  • Chu, F., Halpern, J., Seshadri, P.: Least expected cost query optimization: an exercise in utility. In: Proceedings of the ACM Symposium on the Principles of Database Systems (1999)

    Google Scholar 

  • Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pp. 150–160 (1994)

    Chapter  Google Scholar 

  • Cole, R.L.: A decision theoretic cost model for dynamic plans. IEEE Data Eng. Bull. (2000)

  • Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proceedings of the 22nd International VLDB Conference (1996)

    Google Scholar 

  • Degenaro, L., Iyengar, A., Lipkind, I., Rouvellou, I.: A middleware system which intelligently caches query results. In: IFIP/ACM International Conference on Distributed Systems Platforms (2000)

    Google Scholar 

  • Deshpande, P., Ramasamy, K., Shukla, A., Naughton, J.: Caching multidimensional queries using chunks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, vol. 27, Issue 2 (1998)

  • Fetterly, D.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the LSDS-IR. CEUR Workshop, vol. 80, ISSN 1613-0073, p. 8 (2009)

    Google Scholar 

  • Fu, Y.: A Self-Managed Predicate-Based Cache. Faculty of Computer Science. Technical Report, Dalhousie University, Halifax (2005)

  • Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: Proceedings of the 2001 ACM SIGMOD Conference on Management of Data, pp. 461–472 (2001)

    Chapter  Google Scholar 

  • Gibbons, P., Matias, Y., Poosala, V.: Fast incremental maintenance of approximate histograms. ACM Trans. Database Syst. 27, 261–298 (2002)

    Article  Google Scholar 

  • Halevy, A.Y.: Answering queries using views: a survey. VLDB J. 10(4), 270–294 (2001)

    Article  MATH  Google Scholar 

  • Hellerstein, J., Naughton, J.: Query execution techniques for caching expensive methods. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 423–434 (1996)

    Chapter  Google Scholar 

  • Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28, 121–123 (1996)

    Article  Google Scholar 

  • Ioannidis, Y.E., Raymond, N., Shim, K., Sellis, T.K.: Parametric query optimization. In: Proceedings of the 18th International Conference on Very Large Databases (VLDB), pp. 103–114 (1992)

    Google Scholar 

  • Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Rec. 27, 106–117 (1998)

    Article  Google Scholar 

  • Keller, A., Basu, J.: A predicate-based caching scheme for client-server database architectures. VLDB J. 5, 35–47 (1996)

    Article  Google Scholar 

  • Kossmann, D., Stocker, K.: Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. Database Syst. 25, 43–82 (2000)

    Article  Google Scholar 

  • Labio, W., Yang, J., Cui, Y., Garcia-Molina, H., Widom, J.: Performance issues in incremental warehouse maintenance. In: Proceedings of the 2000 International Conference on Very Large Data Bases, pp. 461–472 (2000)

    Google Scholar 

  • Lempel, R., Moran, S.: Predictive caching and prefetching of query results in search engines. In: Proceedings of the 12th International Conference on World Wide Web (2003)

    Google Scholar 

  • Lencevicius, R., Holzle, U., Singh, A.K.: Query-based debugging of object-oriented programs. In: Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 304–317 (1997)

    Google Scholar 

  • Meijer, E., Beckman, B., Bierman, G.: LINQ: reconciling object, relations and XML in the .NET framework. SIGMOD (2006)

  • Mistry, H., Roy, P., Sudarshan, S., Ramamritham, K.: Materialize view selection and maintenance using multi-query optimization. In: Proceedings of ACM SIGMOD (2001)

    Google Scholar 

  • Nelson, M.: Robocode (2012). Accessed 6 September 2012. http://robocode.sourceforge.net

  • Nerella, V., Surapaneni, S., Madria, S., Weigert, T.: Exploring query optimization in programming codes by reducing run-time execution. In: IEEE 34th Annual Computer Software and Applications Conference, pp. 407–412 (2010)

    Chapter  Google Scholar 

  • Nerella, V., Madria, S., Weigert, T.: Performance improvement for collection operations using join query optimization. In: IEEE 35th Annual Computer Software and Applications Conference, pp. 468–471 (2011)

    Google Scholar 

  • Ozcan, R., Altingovde, I.S., Ulusoy, O.: Static query result caching revisited. In: Proceedings of the 17th International Conference on World Wide Web (2008)

    Google Scholar 

  • Ozcan, R., Altingovde, I.S., Ulusoy, O.: Cost-aware strategies for query result caching in web search engines. ACM Trans. Web (2011)

  • Pythondocs: Python List comprehensions. Accessed 6 September 2012 (2012). http://docs.python.org/tutorial/datastructures.html

  • Qian, X.: Query folding. In: Proceedings of the 12th International Conference on Data Engineering, pp. 48–55 (1996)

    Google Scholar 

  • Quass, D., Gupta, A., Mumick, I., Widom, J.: Making views self-maintainable for data warehousing. In: Proceedings of the 1996 International Conference on Parallel and Distributed Information Systems, pp. 158–169 (1996)

    Google Scholar 

  • Ross, K., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: trading space for time. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 447–458 (1996)

    Chapter  Google Scholar 

  • Roussopoulos, N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 258–290 (1982)

    Article  MATH  Google Scholar 

  • Roussopoulos, N.: An incremental access method for view-cache: concept, algorithms, and cost analysis. ACM Trans. Database Syst. 16(3), 535–563 (1991)

    Article  Google Scholar 

  • Seppi, K.D., Barnes, J.W., Morris, C.N.: A Bayesian approach to database query optimization. ORSA Journal on Computing, 410–419 (1993)

  • Serpanos, D., Karakostas, G., Wolf, W.: Effective caching of web objects using Zipf’s law. In: IEEE International Conference on Multimedia and Expo, vol. 2, pp. 727–730 (2000)

    Google Scholar 

  • Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6, 191–208 (1997)

    Article  Google Scholar 

  • Surapaneni, S., Nerella, V., Madria, S., Weigert, T.: Exploring caching for efficient collection operations. In: IEEE/ACM 26th Automated Software Engineering (ASE) Conference, pp. 468–471 (2011)

    Google Scholar 

  • Willis, D.: The Java Query Language. Master of Science Thesis, Victoria University of Wellington (2008)

  • Willis, D., Pearce, D.J., Noble, J.: Efficient object querying in Java. In: Proceedings of the European Conference on Object-Oriented Programming (ECOOP) (2006)

    Google Scholar 

  • Willis, D., Pearce, D.J., Noble, J.: Caching and incrementalization in the Java query language. In: Proceedings of the 2008 ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications, pp. 1–18 (2008)

    Google Scholar 

  • Zhou, J., Larson, P., Goldstein, J., Ding, L.: Dynamic materialized views. In: Proceedings of International Conference on Data Engineering, pp. 526–535 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay K. Madria.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nerella, V.K.S., Surapaneni, S., Madria, S.K. et al. Exploring optimization and caching for efficient collection operations. Autom Softw Eng 21, 3–40 (2014). https://doi.org/10.1007/s10515-013-0119-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-013-0119-x

Keywords

Navigation