ABSTRACT
We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive characterization and visualization of ML model performance, which facilitates the understanding of how the model behaves in different scenarios. Moreover, the platform streamlines end-to-end prototyping by providing interactive data augmentation and model comparison capabilities within a no-coding environment. Our demonstration showcases the versatility of Rapsai through several use cases, including virtual background, visual effects with depth estimation, and audio denoising. The implementation of Rapsai is intended to support ML practitioners in streamlining their workflow, making data-driven decisions, and comprehensively evaluating model behavior with real-world input.
Footnotes
1 Portrait Depth API: https://tfhub.dev/tensorflow/tfjs-model/ar_portrait_depth/1
Footnote2 MediaPipe API: https://tfhub.dev/mediapipe/tfjs-model/selfie_segmentation/general
Footnote
Supplemental Material
Available for Download
- Michelle Carney, Barron Webster, Irene Alvarado, Kyle Phillips, Noura Howell, Jordan Griffith, Jonas Jongejan, Amit Pitaru, and Alexander Chen. 2020. Teachable Machine: Approachable Web-Based Tool for Exploring Machine Learning Classification. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3334480.3382839Google ScholarDigital Library
- John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories With Generative Pretrained Language Models. In CHI Conference on Human Factors in Computing Systems. 1–19. https://doi.org/10.1145/3491102.3501819Google ScholarDigital Library
- Ruofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xiuxiu Yuan, Yinda Zhang, Anuva Kulkarni, Xingyu Liu, Ahmed Sabie, Sergio Escolano, Abhishek Kar, Ping Yu, Ram Iyengar, Adarsh Kowdle, and Alex Olwal. 2023. Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI). ACM. https://doi.org/10.1145/3544548.3581338Google ScholarDigital Library
- Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, Shahram Izadi, Adarsh Kowdle, Konstantine Tsotsos, and David Kim. 2020. DepthLab: Real-Time 3D Interaction With Depth Maps for Mobile Augmented Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology(UIST). ACM, 829–843. https://doi.org/10.1145/3379337.3415881Google ScholarDigital Library
- Michael Gleicher, Aditya Barve, Xinyi Yu, and Florian Heimerl. 2020. Boxer: Interactive Comparison of Classifier Results. Computer Graphics Forum (Jun. 2020). https://doi.org/10.1111/cgf.13972Google ScholarCross Ref
- Na Li, Jason Mayes, and Ping Yu. 2021. ML Tools for the Web: a Way for Rapid Prototyping and HCI Research. Springer International Publishing. https://doi.org/10.1007/978-3-030-82681-9_10Google ScholarCross Ref
- Rohit Pandey, Sergio Escolano, Chloe Legendre, Christian Häne, Sofien Bouaziz, Christoph Rhemann, Paul Debevec, and Sean Fanello. 2021. Total Relighting. ACM Transactions on Graphics (Aug. 2021). https://doi.org/10.1145/3450626.3459872Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778Google ScholarDigital Library
- Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger, Ping Yu, Kangyi Zhang, Shanqing Cai, Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado, Fernanda B. Viégas, and Martin Wattenberg. 2019. TensorFlow.js: Machine Learning for the Web and Beyond. https://doi.org/10.48550/arXiv.1901.05350Google ScholarCross Ref
- Thilo Spinner, Udo Schlegel, Hanna Schafer, and Mennatallah El-Assady. 2019. ExplAIner: a Visual Analytics Framework for Interactive and Explainable Machine Learning. IEEE Transactions on Visualization and Computer Graphics (2019). https://doi.org/10.1109/TVCG.2019.2934629Google ScholarCross Ref
- Feitong Tan, Danhang Tang, Mingsong Dou, Kaiwen Guo, Rohit Pandey, Cem Keskin, Ruofei Du, Deqing Sun, Sofien Bouaziz, Sean Fanello, Ping Tan, and Yinda Zhang. 2021. HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). IEEE, 1820–1830. https://doi.org/10.1109/CVPR46437.2021.00186Google ScholarCross Ref
- Bingyuan Wu and Yongxiong Wang. 2022. Rich Global Feature Guided Network for Monocular Depth Estimation. SSRN Electronic Journal(2022). https://doi.org/10.2139/ssrn.4057946Google ScholarCross Ref
- Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie Cai. 2022. PromptChainer: Chaining Large Language Model Prompts Through Visual Programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM. https://doi.org/10.1145/3491101.3519729Google ScholarDigital Library
- Tongshuang Wu, Michael Terry, and Carrie Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3517582Google ScholarDigital Library
Index Terms
- Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
Recommendations
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsIn recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for ...
Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and TechnologyWe demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai [3], we further integrated large language models and custom APIs into the ...
Comments