Abstract
Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to replicate operating system (OS) services over multiple coherence domains with minimum inter-domain communication. In designing such an OS, we set three goals: to ease application development, to simplify OS engineering, and to preserve the current OS performance. To this end, we identify a shared-most OS model for multiple coherence domains: creating per-domain instances of core OS services with no shared state, while enabling other extended OS services to share state across domains. To test the model, we build K2, a prototype OS on the TI OMAP4 SoC, by reusing most of the Linux 3.4 source. K2 presents a single system image to applications with its two kernels running on top of the two coherence domains of OMAP4. The two kernels have independent instances of core OS services, such as page allocation and interrupt management, as coordinated by K2; the two kernels share most extended OS services, such as device drivers, whose state is kept coherent transparently by K2. Despite platform constraints and unoptimized code, K2 improves energy efficiency for light OS workloads by 8x-10x, while incurring less than 9% performance overhead for two device drivers shared between kernels. Our experiences with K2 show that the shared-most model is promising.
- Yuvraj Agarwal, Steve Hodges, Ranveer Chandra, James Scott, Paramvir Bahl, and Rajesh Gupta. 2009. Somniloquy: Augmenting network interfaces to reduce PC energy usage. In Proc. USENIX NSDI. USENIX Association, Berkeley, CA, 365--380. Google ScholarDigital Library
- Glenn Ammons, Jonathan Appavoo, Maria Butrico, Dilma Da Silva, David Grove, Kiyokuni Kawachiya, et al. 2007. Libra: A library operating system for a JVM in a virtualized execution environment. In Proc. VEE. ACM, 44--54. Google ScholarDigital Library
- Jonathan Appavoo, Dilma Da Silva, Orran Krieger, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, et al. 2007. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems (TOCS) 25, 3 (2007), 6. Google ScholarDigital Library
- ARM. 2010. ARM v7-M Architecture Reference Manual. Retrieved from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html.Google Scholar
- Francisco J Ballesteros, Noah Evans, Charles Forsyth, Gorka Guardiola, Jim McKie, Ron Minnich, and Enrique Soriano. 2012. Nix: An operating system for high performance manycore computing. Bell Labs Technical Journal 17, 2 (2012), 41--54. Google ScholarDigital Library
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, et al. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proc. ACM SOSP. ACM, 29--44. Google ScholarDigital Library
- Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M. Frans Kaashoek, Robert Morris, et al. 2008. Corey: An operating system for many cores. In Proc. USENIX OSDI, Vol. 8. 43--57. Google ScholarDigital Library
- Edouard Bugnion, Scott Devine, Kinshuk Govil, and Mendel Rosenblum. 1997. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems (TOCS) 15, 4 (1997), 412--447. Google ScholarDigital Library
- J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, and A. Gupta. 1995. Hive: Fault containment for shared-memory multiprocessors. In Proc. ACM SOSP (SOSP’95). ACM, New York, NY, 12--25. DOI:http://dx.doi.org/10.1145/224056.224059 Google ScholarDigital Library
- David Cheriton. 1988. The V distributed system. Commun. ACM 31, 3 (1988), 314--333. Google ScholarDigital Library
- Matthew DeVuyst, Ashish Venkat, and Dean M. Tullsen. 2012. Execution migration in a heterogeneous-ISA chip multiprocessor. In Proc. ACM ASPLOS. ACM, New York, NY, 261--272. DOI:http://dx.doi.org/10.1145/2150976.2151004 Google ScholarDigital Library
- eLinux.org. 2012. PandaBoard Power Measurements. Retrieved from http://elinux.org/PandaBoard_Power_Measurements.Google Scholar
- Benjamin Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm. 1999. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proc. USENIX OSDI, Vol. 99. 87--100. Google ScholarDigital Library
- Isaac Gelado, John E. Stone, Javier Cabezas, Sanjay Patel, Nacho Navarro, and Wen-mei W. Hwu. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proc. ACM ASPLOS. ACM, New York, NY, 347--358. DOI:http://dx.doi.org/10.1145/1736020.1736059 Google ScholarDigital Library
- Peter Greenhalgh. 2011. Big.LITTLE Processing with ARM Cortex-A15 and Cortex-A7. Technical Report.Google Scholar
- Kai Li and Paul Hudak. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321--359. DOI:http://dx.doi.org/10.1145/75104.75105 Google ScholarDigital Library
- F. X. Lin, Z. Wang, R. LiKamWa, and L. Zhong. 2012b. Reflex: Using low-power processors in smartphones without knowing them. In Proc. ACM ASPLOS. Google ScholarDigital Library
- F. X. Lin, Z. Wang, and L. Zhong. 2012a. Supporting distributed execution of smartphone workloads on loosely coupled heterogeneous processors. In Proc. Workshp. Power-Aware Computing and Systems (HotPower’12). Google ScholarDigital Library
- Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, et al. 2013. Unikernels: Library operating systems for the cloud. In Proc. ACM ASPLOS. ACM, 461--472. Google ScholarDigital Library
- Timothy G. Mattson, Michael Riepen, Thomas Lehnig, Paul Brett, Werner Haas, et al. 2010. The 48-core SCC processor: The programmer’s view. In Proc. IEEE/ACM SC Conf. IEEE Computer Society, 1--11. Google ScholarDigital Library
- NICTA. 2012. Linux-Panda Project. Retrieved from http://www.ertos.nicta.com.au/downloads/linux-panda-m3.tbz2.Google Scholar
- Edmund B. Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, and Galen Hunt. 2009. Helios: Heterogeneous multiprocessing with satellite kernels. In Proc. ACM SOSP (SOSP’09). ACM, New York, NY, 221--234. DOI:http://dx.doi.org/10.1145/1629575.1629597 Google ScholarDigital Library
- NVIDIA. 2011. Tegra2 Family: Technical Reference Manual. Retrieved from https://developer.nvidia.com/tegra-2-technical-reference-manual.Google Scholar
- NVIDIA. 2012. Tegra3 HD mobile processors: Technical Reference Manual. Retrieved from https://developer.nvidia.com/tegra-3-technical-reference-manual.Google Scholar
- Donald E. Porter, Silas Boyd-Wickizer, Jon Howell, Reuben Olinsky, and Galen C. Hunt. 2011. Rethinking the library OS from the top down. In Proc. ACM ASPLOS (ASPLOS XVI). ACM, New York, NY, 291--304. DOI:http://dx.doi.org/10.1145/1950365.1950399 Google ScholarDigital Library
- Bodhi Priyantha, Dimitrios Lymberopoulos, and Jie Liu. 2011. Littlerock: Enabling energy-efficient continuous sensing on mobile phones. IEEE Pervasive Computing 10, 2 (2011), 12--15. Google ScholarDigital Library
- Moo-Ryong Ra, Bodhi Priyantha, Aman Kansal, and Jie Liu. 2012. Improving energy efficiency of personal sensing applications with heterogeneous multi-processors. In Proc. ACM UbiComp. ACM, 1--10. Google ScholarDigital Library
- Leonid Ryzhyk, Peter Chubb, Ihor Kuz, and Gernot Heiser. 2009. Dingo: Taming device drivers. In Proc. the European Conf. Computer Systems (EuroSys’09). ACM, 275--288. Google ScholarDigital Library
- Samsung. 2012. Exynos 4210 Application Processor. Retrieved from http://www.samsung.com/global/business/semiconductor/product/application/detail?productId=7644&iaId==844.Google Scholar
- D. J. Scales, K. Gharachorloo, and C. A. Thekkath. 1996. Shasta: A low overhead, software-only approach for supporting fine-grain shared memory. ACM SIGOPS Operating Systems Review 30, 5 (1996), 174--185. Google ScholarDigital Library
- SGI. 1998. Cellular IRIX 6.4 Technical Report. Retrieved from http://www.sgistuff.net/software/irixintro/documents/irix6.4TR.html.Google Scholar
- Youngmin Shin, Ken Shin, Prashant Kenkare, Rajesh Kashyap, Hoi-Jin Lee, Dongjoo Seo, et al. 2013. 28nm high-metal-gate heterogeneous quad-core CPUs for high-performance and energy-efficient mobile application processor. In Proc. IEEE Intl. Solid-State Circuits Conf. (ISSCC’13). IEEE, 154--155.Google ScholarCross Ref
- Peter Smith and Norman C. Hutchinson. 1998. Heterogeneous process migration: The Tui system. Software-Practice and Experience 28, 6 (1998), 611--640. Google ScholarDigital Library
- Jacob Sorber, Nilanjan Banerjee, Mark D. Corner, and Sami Rollins. 2005. Turducken: Hierarchical power management for mobile devices. In Proc. USENIX/ACM MobiSys. ACM, 261--274. Google ScholarDigital Library
- Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture, Vol. 6. Morgan & Claypool. DOI:http://dx.doi.org/10.2200/S00346ED1V01Y201104CAC016 Google ScholarDigital Library
- Texas Instruments. 2010a. OMAP4 Applications Processor: Technical Reference Manual. Retrieved from http://www.ti.com/product/OMAP4470.Google Scholar
- Texas Instruments. 2010b. OMAP543x: Technical Reference Manual. Retrieved from http://www.ti.com/litv/pdf/swpu249v.Google Scholar
- Ronald C. Unrau, Orran Krieger, Benjamin Gamsa, and Michael Stumm. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. Journal of Supercomputing 9, 1--2 (1995), 105--134. Google ScholarDigital Library
- Carl A. Waldspurger. 2002. Memory resource management in VMware ESX server. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 181--194. DOI:http://dx.doi.org/10.1145/844128.844146 Google ScholarDigital Library
- David Wentzlaff and Anant Agarwal. 2009. Factored operating systems (FOS): The case for a scalable operating system for multicores. SIGOPS Oper. Syst. Rev. 43, 2 (2009), 76--85. Google ScholarDigital Library
- Fengyuan Xu, Yunxin Liu, Thomas Moscibroda, Ranveer Chandra, Long Jin, Yongguang Zhang, and Qun Li. 2013. Optimizing background email sync on smartphones. In Proc. USENIX/ACM MobiSys. 55--68. Google ScholarDigital Library
- Lin Zhong and Niraj K. Jha. 2006. Dynamic power optimization targeting user delays in interactive systems. IEEE Trans. Mobile Computing 5, 11 (2006), 1473--1488. Google ScholarDigital Library
Index Terms
- K2: A Mobile Operating System for Heterogeneous Coherence Domains
Recommendations
K2: a mobile operating system for heterogeneous coherence domains
ASPLOS '14Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to ...
K2: a mobile operating system for heterogeneous coherence domains
ASPLOS '14Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to ...
K2: a mobile operating system for heterogeneous coherence domains
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsMobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to ...
Comments