skip to main content
10.1145/3555041.3589395acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
abstract

Towards a Framework for Data Pipeline Discovery

Published: 05 June 2023 Publication History

Abstract

With the recent developments of Internet of Things (IoT) and cloudbased technologies, massive amounts of data are generated by heterogeneous sources and stored through dedicated cloud solutions. Often organizations generate much more data than they are able to interpret, and current Cloud Computing technologies cannot fully meet the requirements of the Big Data processing applications and their data transfer overheads [3]. Many data are stored for compliance purposes only but not turned into value, thus becoming Dark Data, which are not only an unused value but also pose a risk for organizations [7, 18].

Supplemental Material

MP4 File
Presentation video for the submission to the Sigmod2023 student research competition titled "Towards a Framework for Data Pipeline Discovery".

References

[1]
Simone Agostinelli, Dario Benvenuti, Francesca De Luzi, and Andrea Marrella. 2021. Big Data Pipeline Discovery through Process Mining: Challenges and Research Directions. In 1st Italian Forum on Business Process Management colocated with the 19th Int. Conf. of Business Process Management (BPM 2021). 50--55.
[2]
Adriano Augusto, Raffaele Conforti, Marlon Dumas, Marcello La Rosa, Fabrizio Maria Maggi, Andrea Marrella, Massimo Mecella, and Allar Soo. 2018. Automated discovery of process models from event logs: review and benchmark. IEEE transactions on knowledge and data engineering 31, 4 (2018), 686--705.
[3]
Mutaz Barika, Saurabh Garg, Albert Y. Zomaya, Lizhe Wang, Aad Van Moorsel, and Rajiv Ranjan. 2019. Orchestrating Big Data AnalysisWorkflows in the Cloud: Research Challenges, Survey, and Future Directions. ACM Comput. Surv. 52, 5, Article 95 (Sept. 2019), 41 pages.
[4]
Dario Benvenuti, Leonardo Falleroni, Andrea Marrella, and Fernando Perales. 2022. An Interactive Approach to Support Event Log Generation for Data Pipeline Discovery. In 46th IEEE Annual Computers, Software, and Appl. Conf., COMPSAC 2022.
[5]
Sangeeta Chakrabarty and Ramprasad S Joshi. 2020. Dark Data: People to People Recovery. In ICT Analysis and Applications. Springer, 247--254.
[6]
Angelo Corallo, Anna Maria Crespino, Vito Del Vecchio, Mariangela Lazoi, and Manuela Marra. 2021. Understanding and Defining Dark Data for the Manufacturing Industry. IEEE Transactions on Engineering Management (2021).
[7]
Gregory Gimpel. 2020. Bringing dark data into the light: Illuminating existing IoT data lost within your organization. Business Horizons 63, 4 (2020), 519--530.
[8]
Thorsten Gressling. 2020. Data Science in Chemistry: Artificial Intelligence, Big Data, Chemometrics and Quantum Computing with Jupyter. De Gruyter.
[9]
XES Working Group et al. 2016. IEEE standard for eXtensible event stream (XES) for achieving interoperability in event logs and event streams. IEEE Std 1849 (2016), 1--50.
[10]
Aiswarya Raj Munappy, Jan Bosch, and Helena Homström Olsson. 2020. Data Pipeline Management in Practice: Challenges and Opportunities. In International Conference on Product-Focused Software Process Improvement. Springer, 168--184.
[11]
Nikolay Nikolov, Yared Dejene Dessalk, Akif Quddus Khan, Ahmet Soylu, Mihhail Matskin, Amir H. Payberah, and Dumitru Roman. 2021. Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers. Internet of Things 16 (2021).
[12]
Omogbai Oleghe and Konstantinos Salonitis. 2020. A framework for designing data pipelines for manufacturing systems. Procedia CIRP 93 (2020), 724--729.
[13]
Beth Plale and Inna Kouper. 2017. The centrality of data: data lifecycle and data pipelines. In Data analytics for intelligent transportation systems. Elsevier, 91--111.
[14]
Tilmann Rabl and Hans-Arno Jacobsen. 2012. Big data generation. In Specifying Big Data Benchmarks. Springer, 20--27.
[15]
Dumitru Roman, Nikolay Nikolov, Ahmet Soylu, Brian Elvesæter, Hui Song, Radu Prodan, Dragi Kimovski, Andrea Marrella, Francesco Leotta, Mihhail Matskin, Giannis Ledakis, Konstantinos Theodosiou, Anthony Simonet-Boulogne, Fernando Perales, Evgeny Kharlamov, Alexandre Ulisses, Arnor Solberg, and Raffaele Ceccarelli. 2021. Big Data Pipelines on the Computing Continuum: Ecosystem and Use Cases Overview. In IEEE Symposium on Computers and Communications, ISCC 2021. IEEE, 1--4. https://doi.org/10.1109/ISCC53001.2021.9631410
[16]
Vinicius Stein Dani, Henrik Leopold, Jan Martijn EM van der Werf, Xixi Lu, Iris Beerepoot, Jelmer J Koorn, and Hajo A Reijers. 2021. Towards Understanding the Role of the Human in Event Log Extraction. In BPM'21 Workshops. Springer.
[17]
Sebastian Steinau, Andrea Marrella, Kevin Andrews, Francesco Leotta, Massimo Mecella, and Manfred Reichert. 2019. DALEC: a framework for the systematic evaluation of data-centric approaches to process management software. Software & Systems Modeling 18, 4 (2019).
[18]
Haydar Teymourlouei and Lethia Jackson. 2021. Dark Data: Managing Cybersecurity Challenges and Generating Benefits. In Advances in Parallel & Distributed Processing, and Applications. Springer, 91--104.
[19]
Wil van der Aalst. 2016. Data Science in Action. Springer, 3--23.
[20]
Wil Van Der Aalst. 2016. Process mining: data science in action. Vol. 2. Springer.

Index Terms

  1. Towards a Framework for Data Pipeline Discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '23: Companion of the 2023 International Conference on Management of Data
    June 2023
    330 pages
    ISBN:9781450395076
    DOI:10.1145/3555041
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 June 2023

    Check for updates

    Author Tags

    1. dark data
    2. datasets
    3. event log extraction
    4. process discovery
    5. process mining

    Qualifiers

    • Abstract

    Data Availability

    Presentation video for the submission to the Sigmod2023 student research competition titled "Towards a Framework for Data Pipeline Discovery". https://dl.acm.org/doi/10.1145/3555041.3589395#SIGMOD23-fp10.mp4

    Conference

    SIGMOD/PODS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 113
      Total Downloads
    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media