invited-talk

Graph-Inceptor: Towards Extreme Data Ingestion, Massive Graph Creation and Storage

Authors:
Joze Rozanec

Jozef Stefan Institute, Ljubljana, Slovenia

Jozef Stefan Institute, Ljubljana, Slovenia

0000-0002-3665-639X
View Profile

,
Brian Elvesæter

SINTEF, Oslo, Norway

SINTEF, Oslo, Norway

0000-0001-7304-4950
View Profile

,
Dumitru Roman

SINTEF, Oslo, Norway

SINTEF, Oslo, Norway

0000-0001-6397-3705
View Profile

,
Marko Grobelnik

Jozef Stefan Institute, Ljubljana, Slovenia

Jozef Stefan Institute, Ljubljana, Slovenia

0000-0001-7373-5591
View Profile

,
Peter Haase

Metaphacts, Walldorf, Germany

Metaphacts, Walldorf, Germany

0000-0002-7561-7000
View Profile

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance EngineeringApril 2023Pages 253–254https://doi.org/10.1145/3578245.3585339

Published:15 April 2023Publication History

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

Pages 253–254

ABSTRACT

Graph processing is increasingly popular given the wide range of phenomena represented as graphs (e.g., social media networks, pharmaceutical drug compounds, or fraud networks, among others). The increasing amount of data available requires new approaches to efficiently ingest and process such data. In this research, we describe a solution at a conceptual level in the context of the Graph-Massivizer architecture. Graph-Inceptor aims to bridge the void among ETL tools enabling data transformations required for graph creation and enrichment and supporting connectors to multiple graph storages at a massive scale. Furthermore, it aims to enhance ETL operations by learning from data content and load and making decisions based on machine-learning-based predictive analytics.

References

Nesreen K Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1446--1455.Google ScholarDigital Library
Nesreen K Ahmed, Nick Duffield, Theodore Willke, and Ryan A Rossi. 2017. On sampling from massive graph streams. arXiv preprint arXiv:1703.02625 (2017).Google Scholar
Maciej Besta, Dimitri Stanojevic, Johannes De Fine Licht, Tal Ben-Nun, and Torsten Hoefler. 2019. Graph processing on fpgas: Taxonomy, survey, challenges. arXiv preprint arXiv:1903.06697 (2019).Google Scholar
P. Boldi and S. Vigna. 2004. The Webgraph Framework I: Compression Techniques. In Proceedings of the 13th International Conference on World Wide Web (New York, NY, USA) (WWW '04). Association for Computing Machinery, New York, NY, USA, 595--602. https://doi.org/10.1145/988672.988752Google ScholarDigital Library
Nieves R Brisaboa, Susana Ladra, and Gonzalo Navarro. 2014. Compact representation of web graphs with extended functionality. Information Systems, Vol. 39 (2014), 152--174.Google ScholarDigital Library
Aydin Bulucc, Henning Meyerhenke, Ilya Safro, Peter Sanders, and Christian Schulz. 2016. Recent advances in graph partitioning. Springer.Google Scholar
dbt (data build tool. 2023. dbt - transform data in your warehouse. https://www.getdbt.com/Google Scholar
Arturo Diaz-Perez, Alberto Garcia-Robledo, and Jose-Luis Gonzalez-Compean. 2019. Graph Processing Frameworks. Springer International Publishing, Cham, 875--883. https://doi.org/10.1007/978--3--319--77525--8_283Google ScholarCross Ref
Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed Deep Graph Learning at Scale.. In OSDI. 551--568.Google Scholar
Safiollah Heidari, Yogesh Simmhan, Rodrigo N Calheiros, and Rajkumar Buyya. 2018. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Computing Surveys (CSUR), Vol. 51, 3 (2018), 1--53.Google ScholarDigital Library
Nilesh Jain, Guangdeng Liao, and Theodore L Willke. 2013. Graphbuilder: scalable graph etl framework. In First international workshop on graph data management experiences and systems. 1--6.Google ScholarDigital Library
Pradeep Kumar and H Howie Huang. 2020. Graphone: A data store for real-time analytics on evolving graphs. ACM Transactions on Storage (TOS), Vol. 15, 4 (2020), 1--40.Google ScholarDigital Library
Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large scale graph embedding system. Proceedings of Machine Learning and Systems, Vol. 1 (2019), 120--131.Google Scholar
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.Google ScholarDigital Library
Radu Prodan, Dragi Kimovski, Andrea Bartolini, Michael Cochez, Alexandru Iosup, Evgeny Kharlamov, Jovz e Rovz anec, Laurenct iu Vasiliu, and Ana Lucia Vua rbua nescu. 2022. Towards Extreme and Sustainable Graph Processing for Urgent Societal Challenges in Europe. In 2022 IEEE Cloud Summit. IEEE, 23--30. https://doi.org/10.1109/CloudSummit54781.2022.00010Google ScholarCross Ref
Dumitru Roman, Nikolay Nikolov, Antoine Putlier, Dina Sukhobok, Brian Elvesæter, Arne Berre, Xianglin Ye, Marin Dimitrov, Alex Simov, Momchill Zarev, et al. 2018. DataGraft: One-stop-shop for open data management. Semantic Web, Vol. 9, 4 (2018), 393--411. https://doi.org/10.3233/SW-170263Google ScholarDigital Library
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A Boncz, et al. 2021. The future is big graphs: a community view on graph processing systems. Commun. ACM, Vol. 64, 9 (2021), 62--71. https://doi.org/10.1145/3434642Google ScholarDigital Library
Jiayi Shen and Fabrice Huet. 2018. Predict the best graph partitioning strategy by using machine learning technology. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing. 27--33.Google ScholarDigital Library
Giuseppe Vietri, Liana V Rodriguez, Wendy A Martinez, Steven Lyons, Jason Liu, Raju Rangaswami, Ming Zhao, and Giri Narasimhan. 2018. Driving Cache Replacement with ML-based LeCaR.. In HotStorage. 928--936.Google Scholar
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.Google ScholarCross Ref
Da Zheng, Chao Ma, Minjie Wang, Jinjing Zhou, Qidong Su, Xiang Song, Quan Gan, Zheng Zhang, and George Karypis. 2020. Distdgl: distributed graph neural network training for billion-scale graphs. In 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3). IEEE, 36--44.Google ScholarCross Ref
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI open, Vol. 1 (2020), 57--81.Google Scholar
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. Aligraph: A comprehensive graph neural network platform. arXiv preprint arXiv:1902.08730 (2019).Google Scholar
Xiaohan Zou. 2020. A survey on application of knowledge graph. In Journal of Physics: Conference Series, Vol. 1487. IOP Publishing, 012016.Google Scholar

Index Terms

Graph-Inceptor: Towards Extreme Data Ingestion, Massive Graph Creation and Storage
1. Computer systems organization
  1. Architectures

Recommendations

A Scalable framework for data lakes ingestion
Abstract
In the age of big data, the way we store and analyze heterogeneous data has changed. The complexity of various data inputs in the lakes indicates the significant importance of data ingestion that aids companies in making sense and getting more ...
Read More
Knowledge Discovery from Social Graph Data

High volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. In the current era of big data, many traditional data management and analytic approaches ...
Read More
Towards Dynamic Data Placement for Polystore Ingestion
BIRTE '17: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics

Integrating low-latency data streaming into data warehouse architectures has become an important enhancement to support modern data warehousing applications. In these architectures, heterogeneous workloads with data ingestion and analytical queries must ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering
April 2023
421 pages
ISBN:9798400700729
DOI:10.1145/3578245
General Chairs:
Marco Vieira
University of Coimbra, Portugal
,
Valeria Cardellini
University of Rome Tor Vergata, Italy
,
Program Chairs:
Antinisca Di Marco
University of L'Aquila, Italy
,
Petr Tuma
Charles University, Czechia
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2023
Check for updates
Author Tags
ETL
graph creation
ingestion
Qualifiers
- invited-talk
Conference

Acceptance Rates
Overall Acceptance Rate252of851submissions,30%
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 78
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Graph-Inceptor: Towards Extreme Data Ingestion, Massive Graph Creation and Storage

ICPE '23 Companion: Companion of the 2023 ACM/SPEC International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Scalable framework for data lakes ingestion

Knowledge Discovery from Social Graph Data

Towards Dynamic Data Placement for Polystore Ingestion