skip to main content
10.1145/2465351.2465353acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

TimeStream: reliable stream computation in the cloud

Authors Info & Claims
Published:15 April 2013Publication History

ABSTRACT

TimeStream is a distributed system designed specifically for low-latency continuous processing of big streaming data on a large cluster of commodity machines. The unique characteristics of this emerging application domain have led to a significantly different design from the popular MapReduce-style batch data processing. In particular, we advocate a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model to handle failure recovery and dynamic reconfiguration in response to load changes. Several real-world applications running on our prototype have been shown to scale robustly with low latency while at the same time maintaining the simple and concise declarative programming model. TimeStream handles an on-line advertising aggregation pipeline at a rate of 700,000 URLs per second with a 2-second delay, while performing sentiment analysis of Twitter data at a peak rate close to 10,000 tweets per second, with approximately 2-second delay.

References

  1. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  2. Storm. https://github.com/nathanmarz/storm/wiki.Google ScholarGoogle Scholar
  3. Trident. https://github.com/nathanmarz/storm/wiki/Trident-tutorial.Google ScholarGoogle Scholar
  4. Streambase systems. http://streambase.com/.Google ScholarGoogle Scholar
  5. Meijer, E., Beckman, B., and Bierman, G. Linq: Reconciling object, relations and xml in the .NET framework. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ali, M. H., Gerea, C., Raman, B. S., Sezgin, B., Tarnavski, T., Verona, T., Wang, P., Zabback, P., Ananthanarayan, A., Kirilov, A., Lu, M., Raizman, A., Krishnan, R., Schindlauer, R., Grabs, T., Bjeletich, S., Chandramouli, B., Goldstein, J., Bhat, S., Li, Y., Di Nicola, V., Wang, X., Maier, D., Grell, S., Nano, O., and Santos, I. Microsoft CEP server and online behavioral targeting. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andrade, H., Gedik, B., Wu, K. L., and Yu, P. S. Processing high data rate streams in system S. J. Parallel Distrib. Comput. 71, 2 (2011), 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. Fault-tolerance in the Borealis distributed stream processing system. In SIGMOD 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barga, R., Goldstein, J., Ali, M., and Hong, M. Consistent streaming through time: A vision for event stream processing. In CIDR, 2007.Google ScholarGoogle Scholar
  10. Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U. A., and Pasquin, R. Incoop: MapReduce for incremental computations. In SOCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H., and Moran, C. IBM InfoSphere Streams for scalable, real-time, intelligent transportation services. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. Scope: Easy and efficient parallel processing of massive data sets. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chambers, C., Raniwala, A., Perry, F., Adams, S., Henry, R. R., Bradshaw, R., and Weizenbaum, N. FlumeJava: Easy, efficient data-parallel pipelines. In PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gunda, P. K., Ravindranath, L., Thekkath, C. A., Yu, Y., and Zhuang, L. Nectar: Automatic management of data and computation in datacenters. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. Zookeeper: Wait-free coordination for internet-scale systems. In USENIXATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hwang, J. H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., and Zdonik, S. High-availability algorithms for distributed stream processing. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lamport, L. Paxos made simple, fast, and byzantine. In OPODIS, 2002.Google ScholarGoogle Scholar
  19. Liu, C., Correa, R., Gill, H., Gill, T., Li, X., Muthukumar, S., Saeed, T., Loo, B. T., and Basu, P. Puma: Policy-based unified multi-radio architecture for agile mesh networking. In COMSNETS, 2012).Google ScholarGoogle ScholarCross RefCross Ref
  20. Neumeyer, L., Robbins, B., Nair, A., and Kesari, A. S4: Distributed stream computing platform. In ICDM Workshops, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Popa, L., Budiu, M., Yu, Y., and Isard, M. DryadInc: Reusing work in large-scale computations. In HotCloud, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qian, Z., Chen, X., Kang, N., Chen, M., Yu, Y., Moscibroda, T., and Zhang, Z. MadLINQ: Large-scale distributed matrix computation for the cloud. In EuroSys, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shah, M. A., Hellerstein, J. M., and Brewer, E. Highly available, fault-tolerant, parallel dataflows. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., and Murthy, R. Hive: A warehousing solution over a MapReduce framework. In VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xing, Y., Zdonik, S., and Hwang, J. H. Dynamic load distribution in the Borealis stream processor. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P. K., and Currey, J. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zaharia, M., Das, T., Li, H., Shenker, S., and Stoica, I. Discretized Streams: An efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TimeStream: reliable stream computation in the cloud

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
        April 2013
        401 pages
        ISBN:9781450319942
        DOI:10.1145/2465351

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 April 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader