skip to main content
10.1145/3534678.3542617acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract

Model Monitoring in Practice: Lessons Learned and Open Challenges

Published:14 August 2022Publication History

ABSTRACT

Artificial Intelligence (AI) is increasingly playing an integral role in determining our day-to-day experiences. Increasingly, the applications of AI are no longer limited to search and recommendation systems, such as web search and movie and product recommendations, but AI is also being used in decisions and processes that are critical for individuals, businesses, and society. With AI based solutions in high-stakes domains such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. Consequently, it becomes critical to ensure that these models are making accurate predictions, are robust to shifts in the data, are not relying on spurious features, and are not unduly discriminating against minority groups. To this end, several approaches spanning various areas such as explainability, fairness, and robustness have been proposed in recent literature, and many papers and tutorials on these topics have been presented in recent computer science conferences. However, there is relatively less attention on the need for monitoring machine learning (ML) models once they are deployed and the associated research challenges.

In this tutorial, we first motivate the need for ML model monitoring[14], as part of a broader AI model governance[9] and responsible AI framework, from societal, legal, customer/end-user, and model developer perspectives, and provide a roadmap for thinking about model monitoring in practice. We then present findings and insights on model monitoring desiderata based on interviews with various ML practitioners spanning domains such as financial services, healthcare, hiring, online retail, computational advertising, and conversational assistants[15]. We then describe the technical considerations and challenges associated with realizing the above desiderata in practice. We provide an overview of techniques/tools for model monitoring (e.g., see [1, 1, 2, 5, 6, 8, 10-13, 18-21]. Then, we focus on the real-world application of model monitoring methods and tools [3, 4, 7, 11, 13, 16, 17], present practical challenges/guidelines for using such techniques effectively, and lessons learned from deploying model monitoring tools for several web-scale AI/ML applications. We present case studies across different companies, spanning application domains such as financial services, healthcare, hiring, conversational assistants, online retail, computational advertising, search and recommendation systems, and fraud detection. We hope that our tutorial will inform both researchers and practitioners, stimulate further research on model monitoring, and pave the way for building more reliable ML models and monitoring tools in the future.

References

  1. Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning.. In MLSys.Google ScholarGoogle Scholar
  2. Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, and Pavel Vesely. 2021. Relative Error Streaming Quantiles. In PODS.Google ScholarGoogle Scholar
  3. Jordan Edwards et al. 2021. MLOps: Model management, deployment, lineage, and monitoring with Azure Machine Learning. https://tinyurl.com/57y8rrecGoogle ScholarGoogle Scholar
  4. Fiddler. 2022. Explainable Monitoring. https://www.fiddler.ai/ml-monitoringGoogle ScholarGoogle Scholar
  5. João Gama, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 4 (2014), 1--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. 2020. A Unified View of Label Shift Estimation. In NeurIPS.Google ScholarGoogle Scholar
  7. Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud. In KDD.Google ScholarGoogle Scholar
  8. Zohar Karnin, Kevin Lang, and Edo Liberty. 2016. Optimal quantile approximation in streams. In FOCS.Google ScholarGoogle Scholar
  9. Eren Kurshan, Hongda Shen, and Jiahao Chen. 2020. Towards self-regulating AI: Challenges and opportunities of AI model governance in financial services. In Proceedings of the First ACM International Conference on AI in Finance.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. 2018. Detecting and correcting for label shift with black box predictors. In ICML.Google ScholarGoogle Scholar
  11. David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2022. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models. In KDD.Google ScholarGoogle Scholar
  12. Sashank Reddi, Barnabas Poczos, and Alex Smola. 2015. Doubly robust covariate shift correction. In AAAI.Google ScholarGoogle Scholar
  13. Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. In VLDB.Google ScholarGoogle Scholar
  14. David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In NeurIPS.Google ScholarGoogle Scholar
  15. Murtuza N Shergadwala, Himabindu Lakkaraju, and Krishnaram Kenthapadi. 2022. A Human-Centric Take on Model Monitoring. In ICML Workshop on Human-Machine Collaboration and Teaming (HMCaT).Google ScholarGoogle Scholar
  16. Ankur Taly, Kaz Sato, and Claudiu Gruia. 2021. Monitoring feature attributions: How Google saved one of the largest ML services in trouble. https://tinyurl. com/awt3f5ex Google Cloud Blog.Google ScholarGoogle Scholar
  17. TruEra. 2022. TruEra Monitoring. https://truera.com/monitoring/Google ScholarGoogle Scholar
  18. Alexey Tsymbal. 2004. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin 106, 2 (2004), 58.Google ScholarGoogle Scholar
  19. Geoffrey I Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. 2016. Characterizing concept drift. Data Mining and Knowledge Discovery 30, 4 (2016), 964--994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yifan Wu, Ezra Winston, Divyansh Kaushik, and Zachary Lipton. 2019. Domain adaptation with asymmetrically-relaxed distribution alignment. In ICML.Google ScholarGoogle Scholar
  21. Mykola Pechenizkiy, and Joao Gama. 2016. An overview of concept drift applications. Big data analysis: new algorithms for a new society (2016), 91--114.Google ScholarGoogle Scholar

Index Terms

  1. Model Monitoring in Practice: Lessons Learned and Open Challenges

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2022
        5033 pages
        ISBN:9781450393850
        DOI:10.1145/3534678

        Copyright © 2022 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 August 2022

        Check for updates

        Qualifiers

        • abstract

        Acceptance Rates

        Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader