skip to main content
10.1145/2661020.2661028acmotherconferencesArticle/Chapter ViewAbstractPublication PagesjtresConference Proceedingsconference-collections
research-article

On the Locality of Java 8 Streams in Real-Time Big Data Applications

Published:13 October 2014Publication History

ABSTRACT

Typical Big Data frameworks do not consider the architecture of the servers that make up the cluster. However, these computers are increasingly heterogeneous and are based on a ccNUMA architecture. In such architectures, main memory access times differ depending on the core on which access is requested. Hence, as well as locality of data access throughout a cluster of servers, locality of memory access within individual servers can have an impact on performance.

Java is a commonly-used language for Big Data applications (through the popularity of Hadoop) and the newly-released Java 8 introduces streams to simplify data-parallel programming. However, this paper argues that there are no built-in parallel stream sources that can efficiently operate on very large datasets and take data locality into account. This paper details recent work from the JUNIPER project, an EU Framework 7 Project, which is investigating how the Java 8 platform (augmented by the Real-Time Specification for Java) can be used for real-time Big Data applications. JUNIPER introduces architecture-aware stream sources which are suitable for Big Data systems and which preserve locality of data. Our results show that when reading data from disk, thread affinity can seriously degrade the performance of standard Java streams, but JUNIPER's architecture-aware streams maintain their performance.

References

  1. Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org/, accessed 2013/09/01.Google ScholarGoogle Scholar
  2. Apache Software Foundation. Apache Spark -- Lightning-Fast Cluster Computing. http://spark.incubator.apache.org/, accessed 2013/10/03.Google ScholarGoogle Scholar
  3. Greg Bollella and James Gosling. The Real-Time Specification for Java. Computer, 33(6):47--54, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yu Chan, Ian Gray, and Andy Wellings. Exploiting Multicore Architectures in Big Data Applications: The JUNIPER Approach. In Proceedings of 7th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG-2014), January 2014.Google ScholarGoogle Scholar
  5. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: portable parallel programming with the message-passing interface. MIT Press, Cambridge, MA, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. IBM Corporation. X10: Performance and Productivity at Scale. http://x10-lang.org/, accessed 2013/10/07.Google ScholarGoogle Scholar
  8. Adam Jacobs. The pathologies of big data. Queue, 7(6):10:10--10:19, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. JUNIPER Consortium. Java Platform for High-Performance and Real-Time Large Scale Data. http://www.juniper-project.org, accessed 2013/09/16.Google ScholarGoogle Scholar
  10. Alan Kaminsky. Parallel Java 2 Library. http://www.cs.rit.edu/~ark/pj2.shtml, accessed 2014/08/07.Google ScholarGoogle Scholar
  11. Nathan Marz. Storm -- Distributed and fault-tolerant realtime computation. http://storm-project.net/, accessed 2013/10/03.Google ScholarGoogle Scholar
  12. Oracle Corporation. AbstractTask.java. http://hg.openjdk.java.net/jdk8/tl/jdk/file/tip/src/share/classes/java/util/stream/AbstractTask.java, accessed 2014/05/05.Google ScholarGoogle Scholar
  13. Oracle Corporation. Java Stream interface, draft ea-b109. http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html, accessed 2013/09/07.Google ScholarGoogle Scholar
  14. Oracle Corporation. JEP 107: Bulk Data Operations for Collections. http://openjdk.java.net/jeps/107, accessed 2013/09/05.Google ScholarGoogle Scholar
  15. Oracle Corporation. Lesson: Aggregate Operations. http://docs.oracle.com/javase/tutorial/collections/streams/, accessed 2014/05/14.Google ScholarGoogle Scholar
  16. Oracle Corporation. Project Lambda. http://openjdk.java.net/projects/lambda/, accessed 2013/09/05.Google ScholarGoogle Scholar
  17. Terracotta. Terracotta Documentation. http://terracotta.org/documentation/4.0, accessed 2013/12/10.Google ScholarGoogle Scholar
  18. Xabier Cid Vidal and Ramon Cid Manzano. Taking a closer look at LHC. http://www.lhc-closer.es/1/3/12/0, accessed 2014/05/05.Google ScholarGoogle Scholar
  19. Tim Weilkiens. Systems Engineering with SysML/UML: Modeling, Analysis, Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the Locality of Java 8 Streams in Real-Time Big Data Applications

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            JTRES '14: Proceedings of the 12th International Workshop on Java Technologies for Real-time and Embedded Systems
            October 2014
            116 pages
            ISBN:9781450328135
            DOI:10.1145/2661020

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 October 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            JTRES '14 Paper Acceptance Rate12of18submissions,67%Overall Acceptance Rate50of70submissions,71%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader