ABSTRACT
Stream-based graph systems continuously ingest graph-changing events via an established input stream, performing the required computation on the corresponding graph. While there are various benchmarking and evaluation approaches for traditional, batch-oriented graph processing systems, there are no common procedures for evaluating stream-based graph systems. We, therefore, present GraphTides, a generic framework which includes the definition of an appropriate system model, an exploration of the parameter space, suitable workloads, and computations required for evaluating such systems. Furthermore, we propose a methodology and provide an architecture for running experimental evaluations. With our framework, we hope to systematically support system development, performance measurements, engineering, and comparisons of stream-based graph systems.
- Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: a new model and architecture for data stream management. the VLDB Journal 12, 2 (2003), 120--139. Google ScholarDigital Library
- Khaled Ammar and M Tamer Özsu. 2013. WGB: towards a universal graph benchmark. In Workshop on Big Data Benchmarks. Springer, 58--72.Google Scholar
- Arvind Arasu, Mitch Cherniack, Eduardo Galvez, David Maier, Anurag S Maskey, Esther Ryvkina, Michael Stonebraker, and Richard Tibbetts. 2004. Linear road: a stream data management benchmark. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 480--491. Google ScholarDigital Library
- Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A Database Benchmark Based on the Facebook Social Graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 1185--1196. Google ScholarDigital Library
- David A. Bader, John Feo, John Gilbert, Jeremy Kepner, David Koester, Eugene Loh, Kamesh Madduri, Bill Mann, Theresa Meuse, and Eric Robinson. 2009. HPC Scalable Graph Analysis Benchmark. (2009).Google Scholar
- Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5, 2 (2009), 1--24.Google ScholarCross Ref
- Mihai Capotă, Tim Hegeman, Alexandru Iosup, Arnau Prat-Pérez, Orri Erling, and Peter Boncz. 2015. Graphalytics: A big data benchmark for graph-processing platforms. In Proceedings of the GRADES'15. ACM, 7. Google ScholarDigital Library
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.Google ScholarCross Ref
- Raymond Cheng, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. 2012. Kineograph: Taking the Pulse of a Fast-changing and Connected World. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 85--98. Google ScholarDigital Library
- Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, et al. 2016. Benchmarking streaming computation engines: storm, flink and spark streaming. In Parallel and Distributed Processing Symposium Workshops, 2016 IEEE International. IEEE, 1789--1792.Google ScholarCross Ref
- Marek Ciglan, Alex Averbuch, and Ladialav Hluchy. 2012. Benchmarking traversal operations over graph databases. In Data Engineering Workshops (ICDEW), 2012 IEEE 28th International Conference on. IEEE, 186--189. Google ScholarDigital Library
- Graph 500 Steering Committee. 2017. Graph 500 Benchmarks v2.0. https://graph500.org/. (June 2017).Google Scholar
- Miyuru Dayarathna and Toyotaro Suzumura. 2012. XGDBench: A benchmarking platform for graph stores in exascale clouds. In Cloud Computing Technology and Science (CloudCom), 2012 IEEE 4th International Conference on. IEEE, 363--370. Google ScholarDigital Library
- Ayush Dubey, Greg D Hill, Robert Escriva, and Emin Gün Sirer. 2016. Weaver: a high-performance, transactional graph database based on refinable timestamps. Proceedings of the VLDB Endowment 9, 11 (2016), 852--863. Google ScholarDigital Library
- Benjamin Erb, Dominik Meissner, Jakob Pietron, and Frank Kargl. 2017. Chronograph: A Distributed Processing Platform for Online and Batch Computations on Event-sourced Graphs. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS '17). ACM, New York, NY, USA, 78--87. Google ScholarDigital Library
- Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, New York, NY, USA, 619--630. Google ScholarDigital Library
- Ahmad Ghazal, Tilmann Rabl, Minqing Hu, Francois Raab, Meikel Poess, Alain Crolotte, and Hans-Arno Jacobsen. 2013. BigBench: towards an industry standard benchmark for big data analytics. In Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 1197--1208. Google ScholarDigital Library
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX, Hollywood, CA, 17--30. Google ScholarDigital Library
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 599--613. Google ScholarDigital Library
- Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3, 2-3 (2005), 158--182. Google ScholarDigital Library
- Yong Guo, Ana Lucia Varbanescu, Alexandru Iosup, Claudio Martella, and Theodore L Willke. 2014. Benchmarking graph-processing platforms: a vision. In Proceedings of the 5th ACM/SPEC international conference on Performance engineering. ACM, 289--292. Google ScholarDigital Library
- Wentao Han, Youshan Miao, Kaiwei Li, Ming Wu, Fan Yang, Lidong Zhou, Vijayan Prabhakaran, Wenguang Chen, and Enhong Chen. 2014. Chronos: AGraph Engine for Temporal Graph Analysis. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys '14). ACM, New York, NY, USA, Article 1, 14 pages. Google ScholarDigital Library
- Thomas Hartmann, Francois Fouquet, Matthieu Jimenez, Romain Rouvoy, and Yves Le Traon. 2017. Analyzing ComplexData in Motion at Scale with Temporal Graphs. In The 29th International Conference on Software Engineering & Knowledge Engineering (SEKE'17). KSI Research, 6.Google ScholarCross Ref
- Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. 2010. The Hi-Bench benchmark suite: Characterization of the MapReduce-based data analysis. In Data Engineering Workshops (ICDEW), 2010 IEEE 26th International Conference on. IEEE, 41--51.Google ScholarCross Ref
- IEEE. 2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. IEEE Std 1588-2008 (Revision of IEEE Std 1588-2002) (July 2008), 1--300.Google Scholar
- Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, et al. 2016. Ldbc graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment 9, 13 (2016), 1317--1328. Google ScholarDigital Library
- Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proc. of the 4th International Workshop on Graph Data Management Experiences and Systems. ACM, 5. Google ScholarDigital Library
- Raj Jain. 1991. The Art of Computer Systems Performance Analysis - Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley.Google Scholar
- Ivo Jimenez, Michael Sevilla, Noah Watkins, Carlos Maltzahn, Jay Lofstead, Kathryn Mohror, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2017. The popper convention: Making reproducible systems evaluation practical. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 1561--1570.Google ScholarCross Ref
- Martin Junghanns, André Petermann, Martin Neumann, and Erhard Rahm. 2017. Management and Analysis of Big Graph Data: Current Systems and Open Challenges. In Handbook of Big Data Technologies. Springer, 457--505.Google Scholar
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). ACM, New York, NY, USA, 177--187. Google ScholarDigital Library
- Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. Spark-bench: a comprehensive benchmarking suite for in memory data analytic platform spark. In Proceedings of the 12th ACM International Conference on Computing Frontiers. ACM, 53. Google ScholarDigital Library
- Ruirui Lu, Gang Wu, Bin Xie, and Jingtong Hu. 2014. Stream bench: Towards benchmarking modern distributed stream computing frameworks. In Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on. IEEE, 69--78. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146. Google ScholarDigital Library
- Andrew McGregor. 2014. Graph Stream Algorithms: A Survey. SIGMOD Rec. 43, 1 (May 2014), 9--20. Google ScholarDigital Library
- Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! But at what COST?. In 15th Workshop on Hot Topics in Operating Systems. USENIX Association, Kartause Ittingen, Switzerland. Google ScholarDigital Library
- Othon Michail and Paul G. Spirakis. 2018. Elements of the Theory of Dynamic Networks. Commun. ACM 61, 2 (Jan. 2018), 72--72. Google ScholarDigital Library
- Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Gurmeet Manku, Chris Olston, Justin Rosenstein, and Rohit Varma. 2003. Query Processing, Resource Management, and Approximation in a Data Stream Management System-. In IN CIDR. Citeseer.Google Scholar
- Wing Lung Ngai, Tim Hegeman, Stijn Heldens, and Alexandru Iosup. 2017. Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (GRADES'17). ACM, New York, NY, USA, Article 8, 6 pages. Google ScholarDigital Library
- Anil Pacaci, Alice Zhou, Jimmy Lin, and M. Tamer Özsu. 2017. Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (GRADES'17). ACM, New York, NY, USA, Article 12, 7 pages. Google ScholarDigital Library
- Himchan Park and Min-Soo Kim. 2017. TrillionG: A Trillion-scale Synthetic Graph Generator Using a Recursive Vector Model. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). ACM, New York, NY, USA, 913--928. Google ScholarDigital Library
- Arnau Prat-Pérez, Joan Guisado-Gámez, Xavier Fernández Salas, Petr Koupy, Siegfried Depner, and Davide Basilio Bartolini. 2017. Towards a Property Graph Generator for Benchmarking. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (GRADES'17). ACM, New York, NY, USA, Article 6, 6 pages. Google ScholarDigital Library
- Jari Saramäki and Esteban Moro. 2015. From seconds to months: an overview of multi-scale dynamics of mobile telephone calls. The European Physical Journal B 88, 6 (24 Jun 2015), 164.Google ScholarCross Ref
- Yogesh Simmhan, Alok Kumbhare, Charith Wickramaarachchi, Soonil Nagarkar, Santosh Ravi, Cauligi Raghavendra, and Viktor Prasanna. 2014. GoFFish: A Subgraph Centric Framework for Large-Scale Graph Analytics. In Euro-Par 2014 Parallel Processing, Fernando Silva, Inês Dutra, and Vítor Santos Costa (Eds.). Lecture Notes in Computer Science, Vol. 8632. Springer International Publishing, 451--462.Google Scholar
- Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations. SIGARCH Comput. Archit. News 45, 1 (April 2017), 237--251. Google ScholarDigital Library
- Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. IEEE, 488--499.Google ScholarCross Ref
- Christo Wilson, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Y. Zhao. 2012. Beyond Social Graphs: User Interactions in Online Social Networks and Their Implications. ACM Trans. Web 6, 4, Article 17 (Nov. 2012), 31 pages. Google ScholarDigital Library
Index Terms
- Graphtides: a framework for evaluating stream-based graph processing platforms
Recommendations
Distributed temporal graph analytics with GRADOOP
AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, ...
Query-Driven Graph Processing
WWW '22: Companion Proceedings of the Web Conference 2022Graphs are data model abstractions that are becoming pervasive in several real-life applications and practical use cases. In these settings, users primarily focus on entities and their relationships, further enhanced with multiple labels and properties ...
Synergistic Analysis of Evolving Graphs
Evolving graph processing involves repeating analyses, which are often iterative, over multiple snapshots of the graph corresponding to different points in time. Since the snapshots of an evolving graph share a great number of vertices and edges, ...
Comments