skip to main content
10.1145/3626246.3654756acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper
Open access

Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows

Published: 09 June 2024 Publication History

Abstract

Many big data systems are written in languages such as C, C++, Java, and Scala for high efficiency, whereas data analysts often use Python to conduct data wrangling, statistical analysis, and machine learning. User-defined functions (UDFs) are commonly used in these systems to bridge the gap between the two ecosystems. Debugging complex UDFs in data-processing systems is challenging due to the required coordination between language debuggers and the data-processing engine, as well as the debugging overhead on large volumes of data. In this paper, we showcase Udon, a novel debugger to support line-by-line debugging of UDFs in data-processing systems. Udon encapsulates modern line-by-line debugging primitives, such as those to set breakpoints, perform code inspections, and make code modifications while executing a UDF on a single tuple. In this demonstration, we use real-world scenarios to showcase the experience of using Udon for line-by-line debugging of a UDF.

References

[1]
Bertty Contreras-Rojas, Jorge-Arnulfo Quiané-Ruiz, Zoi Kaoudi, and Saravanan Thirumuruganathan. 2019. TagSniff: Simplified Big Data Debugging for Dataflow Jobs. In Proceedings of the ACM Symposium on Cloud Computing, SoCC 2019, Santa Cruz, CA, USA, November 20--23, 2019. ACM, 453--464. https://doi.org/10/mrgh
[2]
Debugging | Apache Flink 2024. https://nightlies.apache.org/flink/flink-docsmaster/ docs/dev/python/debugging/.
[3]
Debugging PySpark -- PySpark 3.1.1 documantation 2024. https://spark.apache. org/docs/3.1.1/api/python/development/debugging.html.
[4]
Yannis Foufoulas and Alkis Simitsis. 2023. User-Defined Functions in Modern Data Engines. In 39th IEEE International Conference on Data Engineering, ICDE 2023, Anaheim, CA, USA, April 3--7, 2023. IEEE, 3593--3598. https://doi.org/10/mrgd
[5]
Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd D. Millstein, and Miryung Kim. 2016. BigDebug: debugging primitives for interactive big data processing in spark. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14--22, 2016, Laura K. Dillon, Willem Visser, and Laurie A. Williams (Eds.). ACM, 784--795. https://doi.org/10.1145/2884781.2884813
[6]
Pedro Holanda, Mark Raasveldt, and Martin L. Kersten. 2017. Don't Keep My UDFs Hostage - Exporting UDFs For Debugging Purposes. In XXXII Simpósio Brasileiro de Banco de Dados - Short Papers, Uberlandia, MG, Brazil, October 4--7, 2017, Carmem S. Hara, Bernadette Farias Lóscio, and Damires Yluska de Souza Fernandes (Eds.). SBC, 246--251. http://sbbd.org.br/2017/wp-content/uploads/sites/ 3/2018/02/p246--251.pdf
[7]
Yicong Huang, Zuozhi Wang, and Chen Li. 2023. Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control. Proc. ACM Manag. Data 1, 4 (2023), 225:1--225:26. https://doi.org/10.1145/3626712
[8]
Xiaozhen Liu, Zuozhi Wang, Shengquan Ni, Sadeem Alsudais, Yicong Huang, Avinash Kumar, and Chen Li. 2022. Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera. Proc. VLDB Endow. 15, 12 (2022), 3738--3741. https://doi.org/10.14778/3554821.3554888
[9]
Matteo Marra, Guillermo Polito, and Elisa Gonzalez Boix. 2020. A debugging approach for live Big Data applications. Sci. Comput. Program. 194 (2020), 102460. https://doi.org/10.1016/j.scico.2020.102460
[10]
pdb - The Python Debugger 2024. https://docs.python.org/3/library/pdb.html.
[11]
Texera 2024. Collaborative Data Analytics Using Workflows, https://github.com/ Texera/texera/.
[12]
Zuozhi Wang, Avinash Kumar, Shengquan Ni, and Chen Li. 2020. Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera. Proc. VLDB Endow. 13, 12 (2020), 2953--2956. https://doi.org/10.14778/3415478.3415517

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
June 2024
694 pages
ISBN:9798400704222
DOI:10.1145/3626246
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2024

Check for updates

Author Tags

  1. data workflows
  2. debugging
  3. python udf
  4. user-defined functions

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGMOD/PODS '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 240
    Total Downloads
  • Downloads (Last 12 months)240
  • Downloads (Last 6 weeks)60
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media