ABSTRACT
Big Data applications suffer from unpredictable and unacceptably high pause times due to bad memory management (Garbage Collection, GC) decisions. This is a problem for all applications but it is even more important for applications with low pause time requirements such as credit-card fraud detection or targeted website advertisement systems, which can easily fail to comply with Service Level Agreements due to long GC cycles (during which the application is stopped). This problem has been previously identified and is related to Big Data applications keeping in memory (for a long period of time, from the GC's perspective) massive amounts of data objects.
Memory management approaches have been proposed to reduce the GC pause time by allocating objects with similar lifetimes close to each other. However, they either do not provide a general solution for all types of Big Data applications (thus only solving the problem for a specific set of applications), and/or require programmer effort and knowledge to change/annotate the application code.
This paper proposes POLM2, a profiler that automatically: i) estimates application allocation profiles based on execution records, and ii) instruments application bytecode to help the GC taking advantage of the profiling information. Thus, no programmer effort is required to change the source code to allocate objects according to their lifetimes. POLM2 is implemented for the OpenJDK HotSpot Java Virtual Machine 8 and uses NG2C, a recently proposed GC which supports multi-generational pretenuring. Results show that POLM2 is able to: i) achieve pauses as low as NG2C (which requires manual source code modification), and ii) significantly reduce application pauses by up to 80% when compared to G1 (default collector in OpenJDK). POLM2 does not negatively impact neither application throughput nor memory utilization.
- Matthew Arnold, Stephen Fink, David Grove, Michael Hind, and Peter F. Sweeney. 2000. Adaptive Optimization in the Jalapeño JVM. In Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '00). ACM, New York, NY, USA, 47--65. Google ScholarDigital Library
- M. Arnold, S. J. Fink, D. Grove, M. Hind, and P. F. Sweeney. 2005. A Survey of Adaptive Optimization in Virtual Machines. Proc. IEEE 93, 2 (Feb 2005), 449--466.Google ScholarCross Ref
- David F. Bacon, Perry Cheng, and V. T. Rajan. 2003. Controlling Fragmentation and Space Consumption in the Metronome, a Real-time Garbage Collector for Java. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems (LCTES '03). ACM, New York, NY, USA, 81--92. Google ScholarDigital Library
- William S Beebee Jr and Martin Rinard. 2001. An implementation of scoped memory for Real-Time Java. In International Workshop on Embedded Software. Springer, 289--305. Google ScholarDigital Library
- Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications (OOPSLA '06). ACM, New York, NY, USA, 169--190. Google ScholarDigital Library
- Stephen M. Blackburn, Matthew Hertz, Kathryn S. Mckinley, J. Eliot B. Moss, and Ting Yang. 2007. Profile-based Pretenuring. ACM Trans. Program. Lang. Syst. 29, 1, Article 2 (Jan. 2007). Google ScholarDigital Library
- Stephen M Blackburn, Richard Jones, Kathryn S. McKinley, and J Eliot B Moss. 2002. Beltway: Getting Around Garbage Collection Gridlock. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02). ACM, New York, NY, USA, 153--164. Google ScholarDigital Library
- Stephen M. Blackburn and Kathryn S. McKinley. 2008. Immix: A Mark-region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). ACM, 22--32. Google ScholarDigital Library
- Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, Jr., and Martin Rinard. 2003. Ownership Types for Safe Region-based Memory Management in Real-time Java. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI '03). ACM, New York, NY, USA, 324--337. Google ScholarDigital Library
- Rodrigo Bruno, Luís Picciochi Oliveira, and Paulo Ferreira. 2017. NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications. In Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management (ISMM 2017). ACM, New York, NY, USA, 2--13. Google ScholarDigital Library
- Yingyi Bu, Vinayak Borkar, Guoqing Xu, and Michael J. Carey. 2013. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management (ISMM '13). ACM, New York, NY, USA, 119--130. Google ScholarDigital Library
- Perry Cheng, Robert Harper, and Peter Lee. 1998. Generational Stack Collection and Profile-driven Pretenuring. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI '98). 162--173. Google ScholarDigital Library
- Daniel Clifford, Hannes Payer, Michael Stanton, and Ben L. Titzer. 2015. Memento Mori: Dynamic Allocation-site-based Optimizations. SIGPLAN Not. 50, 11 (June 2015), 105--117. Google ScholarDigital Library
- Daniel Clifford, Hannes Payer, Michael Starzinger, and Ben L. Titzer. 2014. Allocation Folding Based on Dominance. In Proceedings of the 2014 International Symposium on Memory Management (ISMM '14). ACM, New York, NY, USA, 15--24. Google ScholarDigital Library
- Nachshon Cohen and Erez Petrank. 2015. Data Structure Aware Garbage Collector. In Proceedings of the 2015 International Symposium on Memory Management (ISMM '15). ACM, New York, NY, USA, 28--40. Google ScholarDigital Library
- David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management (ISMM '04). ACM, New York, NY, USA, 37--48. Google ScholarDigital Library
- David Detlefs, Christine Flood, Steve Heller, and Tony Printezis. 2004. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management (ISMM '04). ACM, New York, NY, USA, 37--48. Google ScholarDigital Library
- David Gay and Alex Aiken. 2001. Language Support for Regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI '01). ACM, New York, NY, USA, 70--80. Google ScholarDigital Library
- David Gay and Bjarne Steensgaard. 2000. Fast Escape Analysis and Stack Allocation for Object-Based Programs. In Proceedings of the 9th International Conference on Compiler Construction (CC '00). Springer-Verlag, London, UK, UK, 82--93. Google ScholarDigital Library
- Lokesh Gidra, Gaël Thomas, Julien Sopena, and Marc Shapiro. 2012. Assessing the Scalability of Garbage Collectors on Many Cores. SIGOPS Oper. Syst. Rev. 45, 3 (Jan. 2012), 15--19. Google ScholarDigital Library
- Lokesh Gidra, Gaël Thomas, Julien Sopena, and Marc Shapiro. 2013. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 229--240. Google ScholarDigital Library
- Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingam, Manuel Costa, Derek G. Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping Out Garbage Collection from Big Data Systems. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV). USENIX Association, Kartause Ittingen, Switzerland. https://www.usenix.org/conference/hotos15/workshop-program/presentation/gog Google ScholarDigital Library
- Ionel Gog, Jana Giceva, Malte Schwarzkopf, Kapil Vaswani, Dimitrios Vytiniotis, Ganesan Ramalingan, Derek Murray, Steven Hand, and Michael Isard. 2015. Broom: Sweeping out Garbage Collection from Big Data Systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems (HOTOS'15). USENIX Association, Berkeley, CA, USA, 2--2. Google ScholarDigital Library
- Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based Memory Management in Cyclone. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02). ACM, New York, NY, USA, 282--293. Google ScholarDigital Library
- Niels Hallenberg, Martin Elsman, and Mads Tofte. 2002. Combining Region Inference and Garbage Collection. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI '02). ACM, New York, NY, USA, 141--152. Google ScholarDigital Library
- Timothy L. Harris. 2000. Dynamic Adaptive Pre-tenuring. In Proceedings of the 2nd International Symposium on Memory Management (ISMM '00). ACM, 127--136. Google ScholarDigital Library
- Matthew Hertz, Stephen M. Blackburn, J. Eliot B. Moss, Kathryn S. McKinley, and Darko Stefanović. 2006. Generating Object Lifetime Traces with Merlin. ACM Trans. Program. Lang. Syst. 28, 3 (May 2006), 476--516. Google ScholarDigital Library
- Michael Hicks, Greg Morrisett, Dan Grossman, and Trevor Jim. 2004. Experience with Safe Manual Memory-management in Cyclone. In Proceedings of the 4th International Symposium on Memory Management (ISMM '04). ACM, New York, NY, USA, 73--84. Google ScholarDigital Library
- Richard L. Hudson, Ron Morrison, J. Eliot B. Moss, and David S. Munro. 1997. Garbage Collecting the World: One Car at a Time. In Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '97). ACM, New York, NY, USA, 162--175. Google ScholarDigital Library
- Richard Jones, Antony Hosking, and Eliot Moss. 2016. The garbage collection handbook: the art of automatic memory management. CRC Press. Google ScholarDigital Library
- Richard E. Jones and Chris Ryder. 2008. A Study of Java Object Demographics. In Proceedings of the 7th International Symposium on Memory Management (ISMM '08). ACM, New York, NY, USA, 121--130. Google ScholarDigital Library
- Sumant Kowshik, Dinakar Dhurjati, and Vikram Adve. 2002. Ensuring Code Safety Without Runtime Checks for Real-time Control Systems. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '02). ACM, New York, NY, USA, 288--297. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 591--600. Google ScholarDigital Library
- Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarDigital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40. Google ScholarDigital Library
- Pengcheng Li, Chen Ding, and Hao Luo. 2014. Modeling Heap Data Growth Using Average Liveness. In Proceedings of the 2014 International Symposium on Memory Management (ISMM '14). ACM, New York, NY, USA, 71--82. Google ScholarDigital Library
- Lu Lu, Xuanhua Shi, Yongluan Zhou, Xiong Zhang, Hai Jin, Cheng Pei, Ligang He, and Yuanzhen Geng. 2016. Lifetime-based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow. 9, 12 (Aug. 2016), 936--947. Google ScholarDigital Library
- Simon Marlow, Tim Harris, Roshan P. James, and Simon Peyton Jones. 2008. Parallel Generational-copying Garbage Collection with a Block-structured Heap. In Proceedings of the 7th International Symposium on Memory Management (ISMM '08). ACM, New York, NY, USA, 11--20. Google ScholarDigital Library
- Luis Mastrangelo, Luca Ponzanelli, Andrea Mocci, Michele Lanza, Matthias Hauswirth, and Nathaniel Nystrom. 2015. Use at Your Own Risk: The Java Unsafe API in the Wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 695--710. Google ScholarDigital Library
- Michael McCandless, Erik Hatcher, and Otis Gospodnetic. 2010. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich, CT, USA. Google ScholarDigital Library
- Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu. 2016. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 349--365. http://dl.acm.org/citation.cfm?id=3026877.3026905 Google ScholarDigital Library
- Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu. 2015. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, 675--690. Google ScholarDigital Library
- Filip Pizlo, Lukasz Ziarek, and Jan Vitek. 2009. Real Time Java on Resource-constrained Platforms with Fiji VM. In Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES '09). ACM, New York, NY, USA, 110--119. Google ScholarDigital Library
- Nathan P. Ricci, Samuel Z. Guyer, and J. Eliot B. Moss. 2011. Elephant Tracks: Generating Program Traces with Object Death Records. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (PPPJ '11). 139--142. Google ScholarDigital Library
- Jacob Seligmann and Steffen Grarup. 1995. Incremental mature garbage collection using the train algorithm. In European Conference on Object-Oriented Programming. Springer, 235--252. Google ScholarDigital Library
- Codruţ Stancu, Christian Wimmer, Stefan Brunthaler, Per Larsen, and Michael Franz. 2015. Safe and Efficient Hybrid Memory Management for Java. In Proceedings of the 2015 International Symposium on Memory Management (ISMM '15). ACM, New York, NY, USA, 81--92. Google ScholarDigital Library
- Gil Tene, Balaji Iyengar, and Michael Wolf. 2011. C4: The Continuously Concurrent Compacting Collector. In Proceedings of the International Symposium on Memory Management (ISMM '11). ACM, New York, NY, USA, 79--88. Google ScholarDigital Library
- Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Inf. Comput. 132, 2 (Feb. 1997), 109--176. Google ScholarDigital Library
- David Ungar. 1984. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments (SDE 1). ACM, New York, NY, USA, 157--167. Google ScholarDigital Library
- Raja Vallée-Rai, Etienne Gagnon, Laurie J. Hendren, Patrick Lam, Patrice Pominville, and Vijay Sundaresan. 2000. Optimizing Java Bytecode Using the Soot Framework: Is It Feasible?. In Proceedings of the 9th International Conference on Compiler Construction (CC '00). Springer-Verlag, London, UK, UK, 18--34. http://dl.acm.org/citation.cfm?id=647476.727758 Google ScholarDigital Library
- Guoqing Xu. 2013. Resurrector: A Tunable Object Lifetime Profiling Technique for Optimizing Real-world Programs. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA '13). ACM, 111--130. Google ScholarDigital Library
- Yudi Zheng, Lubomír Bulej, and Walter Binder. 2015. Accurate Profiling in the Presence of Dynamic Compilation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 433--450. Google ScholarDigital Library
Index Terms
- POLM2: automatic profiling for object lifetime-aware memory management for hotspot big data applications
Recommendations
NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications
ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory ManagementBig Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on ...
Runtime Object Lifetime Profiler for Latency Sensitive Big Data Applications
EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019Latency sensitive services such as credit-card fraud detection and website targeted advertisement rely on Big Data platforms which run on top of memory managed runtimes, such as the Java Virtual Machine (JVM). These platforms, however, suffer from ...
Profile-based pretenuring
Pretenuring can reduce copying costs in garbage collectors by allocating long-lived objects into regions that the garbage collector will rarely, if ever, collect. We extend previous work on pretenuring as follows: (1) We produce pretenuring advice that ...
Comments