skip to main content
10.1145/3240323.3241725acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
invited-talk

Measuring operational quality of recommendations: industry talk abstract

Published: 27 September 2018 Publication History

Abstract

With the rise of machine learning in production, we need to talk about operational data science. The talk introduces a pragmatic method on how to measure the response quality of a recommendation service. To that end the definition of a successful response is introduced and guidelines how to capture the rate of successful responses are presented.
There are several changes that can happen during the serving phase of a model which negatively affect the quality of the algorithmic response. A few examples are:
• The model is updated and the new version is inferior to the previous one.
• The latest deployment of the stack that processes the request and serves the model contains a bug.
• Changes in the infrastructure lead to performance loss. An example in an e-commerce setting is switching to a different microservice to obtain article metadata used for filtering the recommendations.
• The input data changes. Typical reasons might be a client application that releases a bug (e.g., lowercasing a case sensitive identifier) or changes a feature in a way that affects the data distribution such as allowing all users to use the product cart instead of previously allowing it only for logged in users. If the change is not detected training data and serving data diverge.
Current monitoring solutions mostly focus on the completion of a request without errors and the request latency. That means the mentioned examples would be hard to detect despite the response quality being significantly degraded, sometimes permanently.
In addition to not being able to detect the mentioned changes, it can be argued that current monitoring practices are not sufficient to capture the performance of a recommender system or any other data driven service in a meaningful way. We might for instance have returned popular articles as a fallback in a case where personalized recommendations were requested. We should record that response as unsuccessful.
A new paradigm for measuring response quality should fulfil the following criteria:
• comparable across models
• simple and understandable metrics
• measurements are collected in real time
• allows for actionable alerting on problems
The response quality is defined as an approximation of how well the response fits the defined business and modelling case. The goal is to bridge the gap between metrics used during model learning and technical monitoring metrics. Ideally we would like to obtain Service Level Objectives (SLO)[1] that contain this quality aspect and can be discussed with the different client applications based on the business cases, e.g., "85% of the order confirmation emails contain personalized recommendations based on the purchase."
A case study will illustrate how algorithmic monitoring was introduced in the recommendation team at Zalando. Zalando is one of Europe's largest fashion retailers and multiple recommendation algorithms serve many online and offline use cases. You will see several examples of how the monitoring helped to identify bugs or diagnose quality problems.

Supplementary Material

MP4 File (p485-weichbrodt.mp4)

Reference

[1]
Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2017. Site Reliability Engineering: How Google Runs Production Systems. Retrieved August 2, 2018 from https://landing.google.com/sre/book/chapters/service-level-objectives.html

Cited By

View all
  • (2023)Artificial Intelligence in Business-to-Customer Fashion Retail: A Literature ReviewMathematics10.3390/math1113294311:13(2943)Online publication date: 30-Jun-2023
  • (2021)A Systematic Literature Review of Artificial Intelligence in Fashion Retail B2C2021 6th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech52315.2021.9566467(01-06)Online publication date: 8-Sep-2021

Index Terms

  1. Measuring operational quality of recommendations: industry talk abstract

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems
    September 2018
    600 pages
    ISBN:9781450359016
    DOI:10.1145/3240323
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2018

    Check for updates

    Author Tags

    1. algorithmic monitoring
    2. operational data science
    3. zalando

    Qualifiers

    • Invited-talk

    Conference

    RecSys '18
    Sponsor:
    RecSys '18: Twelfth ACM Conference on Recommender Systems
    October 2, 2018
    British Columbia, Vancouver, Canada

    Acceptance Rates

    RecSys '18 Paper Acceptance Rate 32 of 181 submissions, 18%;
    Overall Acceptance Rate 254 of 1,295 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Artificial Intelligence in Business-to-Customer Fashion Retail: A Literature ReviewMathematics10.3390/math1113294311:13(2943)Online publication date: 30-Jun-2023
    • (2021)A Systematic Literature Review of Artificial Intelligence in Fashion Retail B2C2021 6th International Conference on Smart and Sustainable Technologies (SpliTech)10.23919/SpliTech52315.2021.9566467(01-06)Online publication date: 8-Sep-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media