skip to main content
10.1145/3298689.3346963acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
abstract

Driving content recommendations by building a knowledge base using weak supervision and transfer learning

Published: 10 September 2019 Publication History

Abstract

With 2.2 million subscribers and two hundred million content views, Chegg is a centralized hub where students come to get help with writing, science, math, and other educational needs. In order to impact a student's learning capabilities we present personalized content to students. Student needs are unique based on their learning style, studying environment and many other factors. Most students will engage with a subset of the products and contents available at Chegg. In order to recommend personalized content to students we have developed a generalized Machine Learning Pipeline that is able to handle training data generation and model building for a wide range of problems. We generate a knowledge base with a hierarchy of concepts and associate student-generated content, such as chat-room data, equations, chemical formulae, reviews, etc with concepts in the knowledge base. Collecting training data to generate different parts of the knowledge base is a key bottleneck in developing NLP models. Employing subject matter experts to provide annotations is prohibitively expensive. Instead, we use weak supervision and active learning techniques, with tools such as snorkel[2], an open source project from Stanford, to make training data generation dramatically easier. With these methods, training data is generated by using broad stroke filters and high precision rules. The rules are modeled probabilistically to incorporate dependencies. Features are generated using transfer learning[1] from language models for classification tasks. We explored several language models and the best performance was from sentence embeddings with skip-thought vectors predicting the previous and the next sentence. The generated structured information is then used to improve product features, and enhance recommendations made to students. In this presentation I will talk about efficient methods of tagging content with categories that come from a knowledge base. Using this information we provide relevant content recommendations to students coming to Chegg for online tutoring, studying flashcards and practicing problems.

References

[1]
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017).
[2]
Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269--282.

Index Terms

  1. Driving content recommendations by building a knowledge base using weak supervision and transfer learning

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems
        September 2019
        635 pages
        ISBN:9781450362436
        DOI:10.1145/3298689
        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 10 September 2019

        Check for updates

        Author Tags

        1. categorization and tagging
        2. content recommendation
        3. knowledge graph
        4. transfer learning
        5. weak supervision

        Qualifiers

        • Abstract

        Conference

        RecSys '19
        RecSys '19: Thirteenth ACM Conference on Recommender Systems
        September 16 - 20, 2019
        Copenhagen, Denmark

        Acceptance Rates

        RecSys '19 Paper Acceptance Rate 36 of 189 submissions, 19%;
        Overall Acceptance Rate 254 of 1,295 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 294
          Total Downloads
        • Downloads (Last 12 months)8
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 05 Mar 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media