skip to main content
10.1145/3578245.3585339acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
invited-talk

Graph-Inceptor: Towards Extreme Data Ingestion, Massive Graph Creation and Storage

Published:15 April 2023Publication History

ABSTRACT

Graph processing is increasingly popular given the wide range of phenomena represented as graphs (e.g., social media networks, pharmaceutical drug compounds, or fraud networks, among others). The increasing amount of data available requires new approaches to efficiently ingest and process such data. In this research, we describe a solution at a conceptual level in the context of the Graph-Massivizer architecture. Graph-Inceptor aims to bridge the void among ETL tools enabling data transformations required for graph creation and enrichment and supporting connectors to multiple graph storages at a massive scale. Furthermore, it aims to enhance ETL operations by learning from data content and load and making decisions based on machine-learning-based predictive analytics.

References

  1. Nesreen K Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1446--1455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nesreen K Ahmed, Nick Duffield, Theodore Willke, and Ryan A Rossi. 2017. On sampling from massive graph streams. arXiv preprint arXiv:1703.02625 (2017).Google ScholarGoogle Scholar
  3. Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph processing on fpgas: Taxonomy, survey, challenges. arXiv preprint arXiv:1903.06697 (2019).Google ScholarGoogle Scholar
  4. P. Boldi and S. Vigna. 2004. The Webgraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW '04). Association for Computing Machinery, New York, NY, USA, 595--602. https://doi.org/10.1145/988672.988752Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nieves R Brisaboa, Susana Ladra, and Gonzalo Navarro. 2014. Compact representation of web graphs with extended functionality. Information Systems, Vol. 39 (2014), 152--174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aydin Bulucc, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. 2016. Recent advances in graph partitioning. Springer.Google ScholarGoogle Scholar
  7. dbt (data build tool. 2023. dbt - transform data in your warehouse. https://www.getdbt.com/Google ScholarGoogle Scholar
  8. Arturo Diaz-Perez, Alberto Garcia-Robledo, and Jose-Luis Gonzalez-Compean. 2019. Graph Processing Frameworks. Springer International Publishing, Cham, 875--883. https://doi.org/10.1007/978--3--319--77525--8_283Google ScholarGoogle ScholarCross RefCross Ref
  9. Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale.. In OSDI. 551--568.Google ScholarGoogle Scholar
  10. Safiollah Heidari, Yogesh Simmhan, Rodrigo N Calheiros, and Rajkumar Buyya. 2018. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Computing Surveys (CSUR), Vol. 51, 3 (2018), 1--53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013. Graphbuilder: scalable graph etl framework. In First international workshop on graph data management experiences and systems. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pradeep Kumar and H Howie Huang. 2020. Graphone: A data store for real-time analytics on evolving graphs. ACM Transactions on Storage (TOS), Vol. 15, 4 (2020), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large scale graph embedding system. Proceedings of Machine Learning and Systems, Vol. 1 (2019), 120--131.Google ScholarGoogle Scholar
  14. Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Radu Prodan, Dragi Kimovski, Andrea Bartolini, Michael Cochez, Alexandru Iosup, Evgeny Kharlamov, Jovz e Rovz anec, Laurenct iu Vasiliu, and Ana Lucia Vua rbua nescu. 2022. Towards Extreme and Sustainable Graph Processing for Urgent Societal Challenges in Europe. In 2022 IEEE Cloud Summit. IEEE, 23--30. https://doi.org/10.1109/CloudSummit54781.2022.00010Google ScholarGoogle ScholarCross RefCross Ref
  16. Dumitru Roman, Nikolay Nikolov, Antoine Putlier, Dina Sukhobok, Brian Elvesæter, Arne Berre, Xianglin Ye, Marin Dimitrov, Alex Simov, Momchill Zarev, et al. 2018. DataGraft: One-stop-shop for open data management. Semantic Web, Vol. 9, 4 (2018), 393--411. https://doi.org/10.3233/SW-170263Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A Boncz, et al. 2021. The future is big graphs: a community view on graph processing systems. Commun. ACM, Vol. 64, 9 (2021), 62--71. https://doi.org/10.1145/3434642Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiayi Shen and Fabrice Huet. 2018. Predict the best graph partitioning strategy by using machine learning technology. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing. 27--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Giuseppe Vietri, Liana V Rodriguez, Wendy A Martinez, Steven Lyons, Jason Liu, Raju Rangaswami, Ming Zhao, and Giri Narasimhan. 2018. Driving Cache Replacement with ML-based LeCaR.. In HotStorage. 928--936.Google ScholarGoogle Scholar
  20. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.Google ScholarGoogle ScholarCross RefCross Ref
  21. Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36--44.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI open, Vol. 1 (2020), 57--81.Google ScholarGoogle Scholar
  23. Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730 (2019).Google ScholarGoogle Scholar
  24. Xiaohan Zou. 2020. A survey on application of knowledge graph. In Journal of Physics: Conference Series, Vol. 1487. IOP Publishing, 012016.Google ScholarGoogle Scholar

Index Terms

  1. Graph-Inceptor: Towards Extreme Data Ingestion, Massive Graph Creation and Storage

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering
      April 2023
      421 pages
      ISBN:9798400700729
      DOI:10.1145/3578245

      Copyright © 2023 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 April 2023

      Check for updates

      Qualifiers

      • invited-talk

      Acceptance Rates

      Overall Acceptance Rate252of851submissions,30%

      Upcoming Conference

    • Article Metrics

      • Downloads (Last 12 months)78
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader