“Fixing the curse of the bad product descriptions” – Search-boosted tag recommendation for E-commerce products

https://doi.org/10.1016/j.ipm.2020.102289Get rights and content

Highlights

  • We perform a study on the tagging behavior of sellers in an e-commerce platform.

  • We design new tag quality attributes that exploit the collective behavior of users.

  • Our attributes exploit the synergy between search and quality of textual content.

  • Queries and clicks can offer useful data for recommending quality tags for products.

  • Our best method, a deep L2R framework, greatly outperforms state-of-the-art methods.

Abstract

Various e-commerce platforms allow sellers to register, describe and organize their own products, using tags and other textual metadata. The quality of these textual descriptors is essential for the effectiveness of e-commerce information services such as search and product recommendation, and thus, for the ability of consumers to find desired products. In this paper, we focus on a particular, widely used textual descriptors of products, tags. We argue that sellers may not be the “best” providers of tag information for products either because of their inability to do so (they were not “trained” for that) or due to an explicit intent to fool the system in order to promote their products with inadequate or imprecise tags (tag spam). To deal with these issues, we may rely on automatic tag recommendation techniques to improve the quality of the tags suggested to describe a given product. In this context, the main novel contribution of our work is a set of new tag recommendation techniques that take advantage of product search result data (in particular the search queries and product clicks from these queries) to improve the quality of the recommended tags. Our main hypothesis is that the set of queries collectively issued by the consumers of the e-market place, along with corresponding clicks, reflect a more trustworthy view of the products; thus those queries and clicks can be exploited as a source of high quality (e.g., more diverse) tags to describe the products. We propose new solutions, including some based on deep learning, that translate this main hypothesis into new features and methods for recommending tags for products. Our manual and automatic evaluations, using real data from one of the largest e-commerce sites in Brazil, show that indeed tags created by sellers contain a lot of noise. On the other hand, our proposed search-boosted tag recommenders are highly effective in suggesting relevant tags, with gains of more than 16% in recommendation effectiveness against the state-of-the-art. Even more, our experiments show that the suggested tags provide a potentially better data source for e-commerce search than the original tags assigned by product sellers.

Introduction

E-marketplace sites, such as Etsy and Ebay1 are characterized by an active participation of users in the creation, description, categorization and rating of product-related pages. On these platforms, a user can be either seller or buyer (or both), performing different actions according to their interest. These platforms face the complex challenge of providing functionality to allow buyers to find sellers and their products through search engines, recommendation systems and other tools. One such e-marketplace is Elo7,2 the largest Brazilian site for buying and selling creative personalized products. On Elo7, customers buy directly from hundreds of thousands of artisans, artists and designers spread all over the country who turn creative ideas into unique products. The platform, which has been around for over a decade, has already exceeded the mark of 8 million users, with over 13 million registered products and more than 30 million unique sessions per month.3

Product pages in an e-marketplace platform contain various data fields related to the product, here referred to as its features. In particular, textual features (e.g., title, description, categories and tags) play a substantial role in these systems since they not only provide effective data sources for information retrieval (IR) services (Figueiredo, Belém, Pinto, Almeida, & Gonçalves, 2012), but also greatly influence the behavior of consumers and, consequently, the success in sales and revenue (Pryzant, joo Chung, & Jurafsky, 2017). Thus, the quality of these textual features is essential for the effectiveness of e-commerce search and product recommendation engines, allowing consumers to easily find what they are looking for or browse the catalog for new ideas and products.

However, there is no guarantee that these features, specially tags, are of good quality for the purpose of supporting effective retrieval. As they are usually created by the sellers, these textual features reflect their individual view, background and interests with relation to their products. One might argue that the sellers are those who should best know how to describe their products. However, a biased view may leave out some important keywords that others may find useful in referring to a particular product. Furthermore, in a self-promotion attempt, sellers may generate tag spam (Koutrika, Effendi, Gyöngyi, Heymann, & Garcia-Molina, 2008) (e.g., popular keywords which are not necessarily related with the associated content) to increase the chances that their products will appear in search result pages. These issues may not only negatively impact the effectiveness of IR services but also frustrate and chase away current and potential customers.

In this context, we tackle the problem of poor (tag-based) product descriptions by building new services for automatically recommending high-quality tags to describe products in e-marketplaces. We are driven by a hypothesis that, unlike seller-generated tags, the set of queries issued by the consumers of the e-marketplace, along with the corresponding clicks, reflect a collective and more trustworthy view of the products. Thus, those queries and clicks can be exploited as a source of high quality (e.g., more diverse) keywords to describe the products. The rationale is that if many users click on a product A after issuing, for instance, the query “wedding invitation”, it is very likely that “wedding invitation” is a relevant keyword to describe A.

Take the example in Fig. 1, which shows part of a real Elo7 product page (translated to English). Clearly there are various non related (popular) tags such as “plant x zombie”, “Mario Bros” and “Minecraft”, which may have been chosen by the seller to promote her product. Yet, after analyzing query logs, we learned that many users clicked on the webpage of that product after searching for “mini-garden cactus succulent” and “birthday souvenir”, which, though clearly related to the product, are not present in its list of tags. Thus, both keywords could be good tag recommendations.

Our goals in investigating the aforementioned hypothesis are twofold. On one hand, we aim at helping sellers to improve the quality of the tags associated with their products by “learning from others”. As consequence, this may help improve the effectiveness of product search by promoting diversity, demoting noise, and helping retrieving more relevant products in top ranked positions that will actually be seen by buyers. On the other hand, the user interactions with a better product search engine may produce higher quality input data (e.g., queries and product clicks) that can be exploited to recommend more relevant tags for products. Thus, these two tasks (search and tag recommendation) can reinforce each other, improving the experience of both groups of users – sellers and consumers – and, ultimately, the revenue obtained by the sellers and the e-commerce company. To our knowledge, despite the rich literature on tag recommendation (Belém, Almeida, Gonçalves, 2016, Kowald, 2018, Qiao, Zhang, Wei, Chen, 2017, Xu, 2018) and search improvement (Pryzant, joo Chung, Jurafsky, 2017, Santu, Sondhi, Zhai, 2017), we are the first to propose a solution that explicitly connects the two tasks in a virtuous cycle of interdependence, specially targeted for the e-commerce realm.

Specifically, we aim at answering four research questions (RQs):

(RQ1) How much noise do seller-created tags contain in an e-marketplace?

(RQ2) Are search queries a better source of keywords to describe products?

(RQ3) How can we improve tag recommendation while exploiting product search data?

(RQ4) Can enhanced tag recommendation improve the quality of e-commerce services search?

To answer RQ1 and RQ2, we invited a group of volunteers to manually inspect the quality of keywords (tags, search queries and tag recommendations) associated with a sample of Elo7 products. Our results show that seller tags indeed contain a great amount of noise (non relevant terms): for 70% of the products, more than 68% of the previously assigned tags were considered irrelevant by at least one volunteer. This largely exceeds the amount of noise in queries associated with products: for 70% of the products, fewer than 30% of the associated queries were considered irrelevant by at least one volunteer. Thus, search data can be exploited to generate candidate tags and to estimate their relevance to products in e-marketplaces.

Motivated by the previous results, we tackle RQ3 by proposing new tag quality attributes that exploit search result data (i.e., the search queries and product clicks from these queries) to generate and rank candidate tags. These attributes exploit different definitions of neighborhood of the target product and use information such as product similarity, co-clicks (products clicked by users after issuing the same query) and the search keywords themselves to extract potential tag candidates. From an algorithmic point of view, we also introduce innovations as we are, to the best of our knowledge, the first to explore deep learning methods for building tag recommenders based on the combination of tag quality/informativeness with search-oriented attributes. More specifically, we propose to use a Deep Multilayer Perceptron architeture – DMLP (Cho, 2013, Lecun, Bengio, Hinton, 2015, Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, 2014) – to combine the newly proposed search-oriented attributes with state-of-the-art ones based on tag quality (Belém, Almeida, Gonçalves, 2016, Belém, Batista, Santos, Almeida, Gonçalves, 2016, Belém, Heringer, Almeida, Gonalves, 2019) for the tag recommendation task. The motivation for choosing such deep learning architecture is that the derived models are non-linear, just like the current state-of-the-art tag quality based solution which exploits Random Forests (RFs) as recommendation algorithm (Belém et al., 2019). We also compare this deep rank architecture, which learns from our set of proposed tag quality attributes, with some advanced state-of-the-art architectures that focus on automatic attribute extraction, namely, bi-directional transformers (Devlin, Chang, Lee, & Toutanova, 2018) and stacked denoising autoencoders (Wang, Shi, & Yeung, 2015).

Our experimental evaluation, using real data from the Elo7 website, shows the benefits of exploiting click-based, neighboorhood-based and tag quality scores with DMLP. In the experiments, our solution produced the overall best results with gains of more than 16% in recommendation effectiveness (precision, recall and NDCG) against the state-of-the-art tag recommender that does not exploit our new contributions.

Finally, to address RQ4, we analyze the impact of the tags recommended by our best strategy in product search. We found that (1) the recommended tags, in isolation, produce a significantly better search quality than the seller tags in a direct comparison, considering all evaluation metrics, and (2) the title is the most effective textual feature in isolation, and this is reinforced by the fact that the system exploits the clickthrough as proxy for relevance and the title is the only textual feature that the user visualizes in search result pages of the Elo7 system; and (3) title is outperformed in terms of Recall when we exploit (i) all existing textual features; (ii) the concatenation of title and seller tags and (iii) the concatenation of title and recommended tags. Thus, we argue that the recommended tags provide a higher potential for improvements when compared to the original tags.

In sum, our main (novel) contributions include:

  • 1.

    A study on the tagging behavior of sellers in a significant e-commerce platform along with the proposal of a solution for the found problems based on tag recommendation services;

  • 2.

    The design of new tag recommendation attributes that exploit the collective behavior of customers and the synergy between quality of (descriptive) content and search services;

  • 3.

    The exploitation of DMLP as a learning-to-rank framework for tag recommendation, producing results that outperform state-of-the-art methods based on Random Forests and advanced neural network architectures.

The rest of this article is organized as follows. In Section 2, we discuss related work, while in Section 3 we formally state the tackled problem. In Sections 4 and 5, we describe the proposed tag recommendation methods and our evaluation methodology, respectively. We present experimental results in Section 6. Finally, we discuss conclusions and directions for future work in Section 7.

Section snippets

Related work

We start by covering related work on tag recommendation (Section 2.1), our proposed solution for poor product descriptions. Next, we cover work on automatic query expansion (Section 2.2), which is a somewhat related problem, followed by prior studies on e-commerce search (Section 2.3), which are the basis for several of our newly proposed tag attributes.

Problem statement

We here propose to address the goal of improving the quality of tags for describing products in e-commerce platforms by relying on tag recommendation techniques. The tag recommendation task consists of generating a list of candidate tags sorted according to their estimated relevance to a target product. Relevance refers to the extent to which a candidate tag is related to or describes the target product. Solutions of this problem can be divided into two steps: (1) the generation of a list of

Tag recommendation methods

In this section, we present novel tag recommendation methods that exploit search result data as well as tag quality metrics to extract and rank candidate tags. Some of the methods are based on defined neighborhoods of the target product (Section 4.1). Others exploit learning-to-rank techniques, together with the defined search oriented and tag quality attributes to learn a recommendation function (Section 4.2).

Experimental setup

In this section, we present our experimental methodology. First, we present our dataset and evaluation methodology (Section 5.1). Next, we describe the baselines (Section 5.2) and the parameterization of all methods (Section 5.3).

Relevance of tags and queries (RQ1, RQ2)

We tackle our first two research questions, as to the relevance of existing tags (RQ1) and associated queries (RQ2) to products, by analyzing the results of the manual inspection of keywords (tags and queries) associated with Elo7 products. We aim at comparing the amount of noise (irrelevant terms) contained in the original tags and in the queries. For each product-keyword pair, we measure the fraction of positive judgments, i.e., the fraction of volunteers who marked the given pair as relevant.

Conclusions and future work

Poor product descriptions, mainly regarding tags, can affect the quality of information services such as search and recommendation offered by e-commerce platforms. The original analyses performed in this article to answer our first research question have demonstrated that the problem is real and non-negligible. On the other hand, the analyses performed to answer our second research question have demonstrated that queries and clicks can offer useful information for recommending quality tags for

CRediT authorship contribution statement

Fabiano M. Belém: Conceptualization, Writing - review & editing, Methodology, Validation. Rodrigo M. Silva: Conceptualization, Writing - review & editing, Methodology. Claudio M.V. de Andrade: Conceptualization, Writing - review & editing, Methodology. Gabriel Person: Methodology. Felipe Mingote: Methodology, Validation. Raphael Ballet: Data curation. Helton Alponti: Data curation, Validation. Henrique P. de Oliveira: Data curation, Validation. Jussara M. Almeida: Conceptualization, Writing -

Acknowledgments

This research was partially supported by CAPES, CNPq, FAPEMIG, EMBRAPII, the Elo7 E-commerce Enterprise and the DCC/UFMG EMBRAPII unit.

References (51)

  • F.M. Belém et al.

    Beyond relevance: Explicitly promoting novelty and diversity in tag recommendation

    ACM Transactions on Intelligent Systems and Technology

    (2016)
  • D.M. Blei et al.

    Latent dirichlet allocation

    Journal of Machine Learning Research

    (2003)
  • C.J.C. Burges

    From RankNet to LambdaRank to LambdaMART: An overview

    Technical Report

    (2010)
  • H. Cao et al.

    Context-aware query suggestion by mining click-through and session data

    Proceedings of the 14th ACM SIGKDDinternational conference on knowledge discovery and data mining

    (2008)
  • C. Carpineto et al.

    A survey of automatic query expansion in information retrieval

    ACM Computing Surveys

    (2012)
  • J. Chen et al.

    Inferring tag co-occurrence relationship across heterogeneous social networks

    Applied Soft Computing

    (2017)
  • K. Cho

    Understanding dropout: Training multi-layer perceptrons with auxiliary independent stochastic neurons

    CoRR

    (2013)
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for...
  • F. Figueiredo et al.

    Assessing the quality of textual features in social media

    Information Processing & Management

    (2012)
  • J. Fleiss

    Measuring nominal scale agreement among many raters

    (1971)
  • P. Geurts et al.

    Learning to rank with extremely randomized trees

    Journal of Machine Learning Research

    (2011)
  • R. Graham et al.

    Exploring feedback models in interactive tagging

    International conference on web intelligence and intelligent agent technology

    (2008)
  • Guo, J., Fan, Y., Pang, L., Yang, L., Ai, Q., Zamani, H., Wu, C., Croft, W. B., & Cheng, X. (2019). A deep look into...
  • C.-K. Huang et al.

    Relevant term suggestion in interactive web search based on contextual information in query session logs

    Journal of the American Society for Information Science and Technology

    (2003)
  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic...
  • Cited by (16)

    • Why do users trust algorithms? A review and conceptualization of initial trust and trust over time

      2022, European Management Journal
      Citation Excerpt :

      In discussions about technology, these mechanisms make recommendations to users based on their characteristics, preferences, and profiles, in order to offer better support for online decision-making (Tahmasbi et al., 2021; Wang & Benbasat, 2007; Yu, 2012). These recommendations offer customized searches, and make it easier for users to find the desired products or services, thus providing a more emotionally and cognitively trustworthy view of queries (Belém et al., 2020; Ding et al., 2019; Komiak & Benbasat, 2006; Marchand & Marx, 2020; Yu et al., 2019). They are used if users perceive them as being useful, and reduce information asymmetry (Pedeliento et al., 2017).

    • TDTMF: A recommendation model based on user temporal interest drift and latent review topic evolution with regularization factor

      2022, Information Processing and Management
      Citation Excerpt :

      With the development of big data era, recommendation system has played a vital role in life, and recommendation system is widely used in various fields (Bah, Aala, & Sm, 2020). Capturing the dynamic preference patterns of users is one of the challenges of current recommendation systems (Belém, Silva, Andrade, Person, & Gonalves, 2020). Dynamic preference patterns may change as time proceeds, and ignoring changes in user preferences and item characteristics can affect the accuracy of recommendations.

    • A deep recommendation model of cross-grained sentiments of user reviews and ratings

      2022, Information Processing and Management
      Citation Excerpt :

      With the growing amount of product information and review data on the Web, it is increasingly challenging for consumers to find products on ecommerce sites that meet their preferences. The recommender system, as an important technology to alleviate information overload, has been used in areas such as health (Chen et al., 2020), intelligent transportation (Younes & Boukerche, 2018), education (Cobos et al., 2013), and e-commerce (Belém et al., 2020). Especially in e-commerce, user preference-oriented product or service recommendations are likely to win favor (Zhu et al., 2020).

    • An Effective, Efficient, and Scalable Confidence-Based Instance Selection Framework for Transformer-Based Text Classification

      2023, SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    View all citing articles on Scopus
    View full text