“Fixing the curse of the bad product descriptions” – Search-boosted tag recommendation for E-commerce products

doi:10.1016/j.ipm.2020.102289

Information Processing & Management

Volume 57, Issue 5, September 2020, 102289

https://doi.org/10.1016/j.ipm.2020.102289 Get rights and content

Highlights

•
We perform a study on the tagging behavior of sellers in an e-commerce platform.
•
We design new tag quality attributes that exploit the collective behavior of users.
•
Our attributes exploit the synergy between search and quality of textual content.
•
Queries and clicks can offer useful data for recommending quality tags for products.
•
Our best method, a deep L2R framework, greatly outperforms state-of-the-art methods.

Abstract

Various e-commerce platforms allow sellers to register, describe and organize their own products, using tags and other textual metadata. The quality of these textual descriptors is essential for the effectiveness of e-commerce information services such as search and product recommendation, and thus, for the ability of consumers to find desired products. In this paper, we focus on a particular, widely used textual descriptors of products, tags. We argue that sellers may not be the “best” providers of tag information for products either because of their inability to do so (they were not “trained” for that) or due to an explicit intent to fool the system in order to promote their products with inadequate or imprecise tags (tag spam). To deal with these issues, we may rely on automatic tag recommendation techniques to improve the quality of the tags suggested to describe a given product. In this context, the main novel contribution of our work is a set of new tag recommendation techniques that take advantage of product search result data (in particular the search queries and product clicks from these queries) to improve the quality of the recommended tags. Our main hypothesis is that the set of queries collectively issued by the consumers of the e-market place, along with corresponding clicks, reflect a more trustworthy view of the products; thus those queries and clicks can be exploited as a source of high quality (e.g., more diverse) tags to describe the products. We propose new solutions, including some based on deep learning, that translate this main hypothesis into new features and methods for recommending tags for products. Our manual and automatic evaluations, using real data from one of the largest e-commerce sites in Brazil, show that indeed tags created by sellers contain a lot of noise. On the other hand, our proposed search-boosted tag recommenders are highly effective in suggesting relevant tags, with gains of more than 16% in recommendation effectiveness against the state-of-the-art. Even more, our experiments show that the suggested tags provide a potentially better data source for e-commerce search than the original tags assigned by product sellers.

Introduction

E-marketplace sites, such as Etsy and Ebay¹ are characterized by an active participation of users in the creation, description, categorization and rating of product-related pages. On these platforms, a user can be either seller or buyer (or both), performing different actions according to their interest. These platforms face the complex challenge of providing functionality to allow buyers to find sellers and their products through search engines, recommendation systems and other tools. One such e-marketplace is Elo7,² the largest Brazilian site for buying and selling creative personalized products. On Elo7, customers buy directly from hundreds of thousands of artisans, artists and designers spread all over the country who turn creative ideas into unique products. The platform, which has been around for over a decade, has already exceeded the mark of 8 million users, with over 13 million registered products and more than 30 million unique sessions per month.³

Product pages in an e-marketplace platform contain various data fields related to the product, here referred to as its features. In particular, textual features (e.g., title, description, categories and tags) play a substantial role in these systems since they not only provide effective data sources for information retrieval (IR) services (Figueiredo, Belém, Pinto, Almeida, & Gonçalves, 2012), but also greatly influence the behavior of consumers and, consequently, the success in sales and revenue (Pryzant, joo Chung, & Jurafsky, 2017). Thus, the quality of these textual features is essential for the effectiveness of e-commerce search and product recommendation engines, allowing consumers to easily find what they are looking for or browse the catalog for new ideas and products.

However, there is no guarantee that these features, specially tags, are of good quality for the purpose of supporting effective retrieval. As they are usually created by the sellers, these textual features reflect their individual view, background and interests with relation to their products. One might argue that the sellers are those who should best know how to describe their products. However, a biased view may leave out some important keywords that others may find useful in referring to a particular product. Furthermore, in a self-promotion attempt, sellers may generate tag spam (Koutrika, Effendi, Gyöngyi, Heymann, & Garcia-Molina, 2008) (e.g., popular keywords which are not necessarily related with the associated content) to increase the chances that their products will appear in search result pages. These issues may not only negatively impact the effectiveness of IR services but also frustrate and chase away current and potential customers.

In this context, we tackle the problem of poor (tag-based) product descriptions by building new services for automatically recommending high-quality tags to describe products in e-marketplaces. We are driven by a hypothesis that, unlike seller-generated tags, the set of queries issued by the consumers of the e-marketplace, along with the corresponding clicks, reflect a collective and more trustworthy view of the products. Thus, those queries and clicks can be exploited as a source of high quality (e.g., more diverse) keywords to describe the products. The rationale is that if many users click on a product A after issuing, for instance, the query “wedding invitation”, it is very likely that “wedding invitation” is a relevant keyword to describe A.

Take the example in Fig. 1, which shows part of a real Elo7 product page (translated to English). Clearly there are various non related (popular) tags such as “plant x zombie”, “Mario Bros” and “Minecraft”, which may have been chosen by the seller to promote her product. Yet, after analyzing query logs, we learned that many users clicked on the webpage of that product after searching for “mini-garden cactus succulent” and “birthday souvenir”, which, though clearly related to the product, are not present in its list of tags. Thus, both keywords could be good tag recommendations.

Our goals in investigating the aforementioned hypothesis are twofold. On one hand, we aim at helping sellers to improve the quality of the tags associated with their products by “learning from others”. As consequence, this may help improve the effectiveness of product search by promoting diversity, demoting noise, and helping retrieving more relevant products in top ranked positions that will actually be seen by buyers. On the other hand, the user interactions with a better product search engine may produce higher quality input data (e.g., queries and product clicks) that can be exploited to recommend more relevant tags for products. Thus, these two tasks (search and tag recommendation) can reinforce each other, improving the experience of both groups of users – sellers and consumers – and, ultimately, the revenue obtained by the sellers and the e-commerce company. To our knowledge, despite the rich literature on tag recommendation (Belém, Almeida, Gonçalves, 2016, Kowald, 2018, Qiao, Zhang, Wei, Chen, 2017, Xu, 2018) and search improvement (Pryzant, joo Chung, Jurafsky, 2017, Santu, Sondhi, Zhai, 2017), we are the first to propose a solution that explicitly connects the two tasks in a virtuous cycle of interdependence, specially targeted for the e-commerce realm.

Specifically, we aim at answering four research questions (RQs):

(RQ1) How much noise do seller-created tags contain in an e-marketplace?

(RQ2) Are search queries a better source of keywords to describe products?

(RQ3) How can we improve tag recommendation while exploiting product search data?

(RQ4) Can enhanced tag recommendation improve the quality of e-commerce services search?

To answer RQ1 and RQ2, we invited a group of volunteers to manually inspect the quality of keywords (tags, search queries and tag recommendations) associated with a sample of Elo7 products. Our results show that seller tags indeed contain a great amount of noise (non relevant terms): for 70% of the products, more than 68% of the previously assigned tags were considered irrelevant by at least one volunteer. This largely exceeds the amount of noise in queries associated with products: for 70% of the products, fewer than 30% of the associated queries were considered irrelevant by at least one volunteer. Thus, search data can be exploited to generate candidate tags and to estimate their relevance to products in e-marketplaces.

Motivated by the previous results, we tackle RQ3 by proposing new tag quality attributes that exploit search result data (i.e., the search queries and product clicks from these queries) to generate and rank candidate tags. These attributes exploit different definitions of neighborhood of the target product and use information such as product similarity, co-clicks (products clicked by users after issuing the same query) and the search keywords themselves to extract potential tag candidates. From an algorithmic point of view, we also introduce innovations as we are, to the best of our knowledge, the first to explore deep learning methods for building tag recommenders based on the combination of tag quality/informativeness with search-oriented attributes. More specifically, we propose to use a Deep Multilayer Perceptron architeture – DMLP (Cho, 2013, Lecun, Bengio, Hinton, 2015, Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, 2014) – to combine the newly proposed search-oriented attributes with state-of-the-art ones based on tag quality (Belém, Almeida, Gonçalves, 2016, Belém, Batista, Santos, Almeida, Gonçalves, 2016, Belém, Heringer, Almeida, Gonalves, 2019) for the tag recommendation task. The motivation for choosing such deep learning architecture is that the derived models are non-linear, just like the current state-of-the-art tag quality based solution which exploits Random Forests (RFs) as recommendation algorithm (Belém et al., 2019). We also compare this deep rank architecture, which learns from our set of proposed tag quality attributes, with some advanced state-of-the-art architectures that focus on automatic attribute extraction, namely, bi-directional transformers (Devlin, Chang, Lee, & Toutanova, 2018) and stacked denoising autoencoders (Wang, Shi, & Yeung, 2015).

Our experimental evaluation, using real data from the Elo7 website, shows the benefits of exploiting click-based, neighboorhood-based and tag quality scores with DMLP. In the experiments, our solution produced the overall best results with gains of more than 16% in recommendation effectiveness (precision, recall and NDCG) against the state-of-the-art tag recommender that does not exploit our new contributions.

Finally, to address RQ4, we analyze the impact of the tags recommended by our best strategy in product search. We found that (1) the recommended tags, in isolation, produce a significantly better search quality than the seller tags in a direct comparison, considering all evaluation metrics, and (2) the title is the most effective textual feature in isolation, and this is reinforced by the fact that the system exploits the clickthrough as proxy for relevance and the title is the only textual feature that the user visualizes in search result pages of the Elo7 system; and (3) title is outperformed in terms of Recall when we exploit (i) all existing textual features; (ii) the concatenation of title and seller tags and (iii) the concatenation of title and recommended tags. Thus, we argue that the recommended tags provide a higher potential for improvements when compared to the original tags.

In sum, our main (novel) contributions include:

1.
A study on the tagging behavior of sellers in a significant e-commerce platform along with the proposal of a solution for the found problems based on tag recommendation services;
2.
The design of new tag recommendation attributes that exploit the collective behavior of customers and the synergy between quality of (descriptive) content and search services;
3.
The exploitation of DMLP as a learning-to-rank framework for tag recommendation, producing results that outperform state-of-the-art methods based on Random Forests and advanced neural network architectures.

The rest of this article is organized as follows. In Section 2, we discuss related work, while in Section 3 we formally state the tackled problem. In Sections 4 and 5, we describe the proposed tag recommendation methods and our evaluation methodology, respectively. We present experimental results in Section 6. Finally, we discuss conclusions and directions for future work in Section 7.

Section snippets

Related work

We start by covering related work on tag recommendation (Section 2.1), our proposed solution for poor product descriptions. Next, we cover work on automatic query expansion (Section 2.2), which is a somewhat related problem, followed by prior studies on e-commerce search (Section 2.3), which are the basis for several of our newly proposed tag attributes.

Problem statement

We here propose to address the goal of improving the quality of tags for describing products in e-commerce platforms by relying on tag recommendation techniques. The tag recommendation task consists of generating a list of candidate tags sorted according to their estimated relevance to a target product. Relevance refers to the extent to which a candidate tag is related to or describes the target product. Solutions of this problem can be divided into two steps: (1) the generation of a list of

Tag recommendation methods

In this section, we present novel tag recommendation methods that exploit search result data as well as tag quality metrics to extract and rank candidate tags. Some of the methods are based on defined neighborhoods of the target product (Section 4.1). Others exploit learning-to-rank techniques, together with the defined search oriented and tag quality attributes to learn a recommendation function (Section 4.2).

Experimental setup

In this section, we present our experimental methodology. First, we present our dataset and evaluation methodology (Section 5.1). Next, we describe the baselines (Section 5.2) and the parameterization of all methods (Section 5.3).

Relevance of tags and queries (RQ1, RQ2)

We tackle our first two research questions, as to the relevance of existing tags (RQ1) and associated queries (RQ2) to products, by analyzing the results of the manual inspection of keywords (tags and queries) associated with Elo7 products. We aim at comparing the amount of noise (irrelevant terms) contained in the original tags and in the queries. For each product-keyword pair, we measure the fraction of positive judgments, i.e., the fraction of volunteers who marked the given pair as relevant.

Conclusions and future work

Poor product descriptions, mainly regarding tags, can affect the quality of information services such as search and recommendation offered by e-commerce platforms. The original analyses performed in this article to answer our first research question have demonstrated that the problem is real and non-negligible. On the other hand, the analyses performed to answer our second research question have demonstrated that queries and clicks can offer useful information for recommending quality tags for

CRediT authorship contribution statement

Fabiano M. Belém: Conceptualization, Writing - review & editing, Methodology, Validation. Rodrigo M. Silva: Conceptualization, Writing - review & editing, Methodology. Claudio M.V. de Andrade: Conceptualization, Writing - review & editing, Methodology. Gabriel Person: Methodology. Felipe Mingote: Methodology, Validation. Raphael Ballet: Data curation. Helton Alponti: Data curation, Validation. Henrique P. de Oliveira: Data curation, Validation. Jussara M. Almeida: Conceptualization, Writing -

Acknowledgments

This research was partially supported by CAPES, CNPq, FAPEMIG, EMBRAPII, the Elo7 E-commerce Enterprise and the DCC/UFMG EMBRAPII unit.

References (51)

F. Belém et al.
Personalized and object-centered tag recommendation methods for web 2.0 applications
Information Processing & Management
(2014)
F. Belém et al.
Exploiting syntactic and neighbourhood attributes to address cold start in tag recommendation
Information Processing and Management
(2019)
R. Krestel et al.
Personalized topic-based tag recommendation
Neurocomputing
(2012)
D. Qiao et al.
Finding competitive keywords from query logs to enhance search engine advertising
Information & Management
(2017)
Y. Wu et al.
Guiding supervised topic modeling for content based tag recommendation
Neurocomputing
(2018)
C. Xu
A novel recommendation method based on social network using matrix factorization technique
Information Processing & Management
(2018)
Q. Ai et al.
Unbiased learning to rank: Theory and practice
Proceedings of the 2018 ACM SIGIR international conference on theory of information retrieval
(2018)
R. Baeza-Yates et al.
Modern information retrieval.
(2011)
F. Belém et al.
A survey on tag recommendation methods
Journal of the Association for Information Science and Technology
(2016)
F. Belém et al.
Image aesthetics and its effects on product clicks in e-commerce search
Proceedings of the SIGIR 2019 workshop on ecommerce, co-located with the 42st international ACM SIGIR conference on research and development in information retrieval, ecom@SIGIR 2019, Paris, France, July 25, 2019.
(2019)

F.M. Belém et al.

Beyond relevance: Explicitly promoting novelty and diversity in tag recommendation

ACM Transactions on Intelligent Systems and Technology

(2016)

D.M. Blei et al.

Latent dirichlet allocation

Journal of Machine Learning Research

(2003)

C.J.C. Burges

From RankNet to LambdaRank to LambdaMART: An overview

Technical Report

(2010)

H. Cao et al.

Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDDinternational conference on knowledge discovery and data mining

(2008)

C. Carpineto et al.

A survey of automatic query expansion in information retrieval

ACM Computing Surveys

(2012)

J. Chen et al.

Inferring tag co-occurrence relationship across heterogeneous social networks

Applied Soft Computing

(2017)

K. Cho

Understanding dropout: Training multi-layer perceptrons with auxiliary independent stochastic neurons

CoRR

(2013)

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for...

F. Figueiredo et al.

Assessing the quality of textual features in social media

Information Processing & Management

(2012)

J. Fleiss

Measuring nominal scale agreement among many raters

(1971)

P. Geurts et al.

Learning to rank with extremely randomized trees

Journal of Machine Learning Research

(2011)

R. Graham et al.

Exploring feedback models in interactive tagging

International conference on web intelligence and intelligent agent technology

(2008)

Guo, J., Fan, Y., Pang, L., Yang, L., Ai, Q., Zamani, H., Wu, C., Croft, W. B., & Cheng, X. (2019). A deep look into...

C.-K. Huang et al.

Relevant term suggestion in interactive web search based on contextual information in query session logs

Journal of the American Society for Information Science and Technology

(2003)

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic...

Cited by (16)

Unveiling the secrets of online consumer choice: A deep learning algorithmic approach to evaluate and predict purchase decisions through EEG responses
2024, Information Processing and Management
This study utilized cognitive neuroscience experiments to assess and predict online individual behavior by evaluating brain activity signals. We conducted an event-related potential (ERP) experiment and analyzed the data obtained from 85 participants. Moreover, we employed a deep learning algorithm to predict purchase decision-making behavior by examining four ERP components as predictive indicators. Empirical results indicated that presentation order effects were induced when participants perceived different presentation orders of three decision support tools. Importantly, the experimental results revealed an accuracy and F1-score of 98% and 0.98, respectively, for consumers’ choice prediction using a convolutional neural network (CNN). Our study not only ushered in a new data collection scheme for information system research but also provided robust scientific evidence utilizing a deep learning approach to represent neural data for better prediction of online consumer behaviors.
Why do users trust algorithms? A review and conceptualization of initial trust and trust over time
2022, European Management Journal
Citation Excerpt :
In discussions about technology, these mechanisms make recommendations to users based on their characteristics, preferences, and profiles, in order to offer better support for online decision-making (Tahmasbi et al., 2021; Wang & Benbasat, 2007; Yu, 2012). These recommendations offer customized searches, and make it easier for users to find the desired products or services, thus providing a more emotionally and cognitively trustworthy view of queries (Belém et al., 2020; Ding et al., 2019; Komiak & Benbasat, 2006; Marchand & Marx, 2020; Yu et al., 2019). They are used if users perceive them as being useful, and reduce information asymmetry (Pedeliento et al., 2017).
Algorithms are increasingly playing a pivotal role in organizations' day-to-day operations; however, a general distrust of artificial intelligence-based algorithms and automated processes persists. This aversion to algorithms raises questions about the drivers that lead managers to trust or reject their use. This conceptual paper aims to provide an integrated review of how users experience the encounter with AI-based algorithms over time. This is important for two reasons: first, their functional activities change over the course of time through machine learning; and second, users' trust develops with their level of knowledge of a particular algorithm. Based on our review, we propose an integrative framework to explain how users’ perceptions of trust change over time. This framework extends current understandings of trust in AI-based algorithms in two areas: First, it distinguishes between the formation of initial trust and trust over time in AI-based algorithms, and specifies the determinants of trust in each phase. Second, it links the transition between initial trust in AI-based algorithms and trust over time to representations of the technology as either human-like or system-like. Finally, it considers the additional determinants that intervene during this transition phase.
TDTMF: A recommendation model based on user temporal interest drift and latent review topic evolution with regularization factor
2022, Information Processing and Management
Citation Excerpt :
With the development of big data era, recommendation system has played a vital role in life, and recommendation system is widely used in various fields (Bah, Aala, & Sm, 2020). Capturing the dynamic preference patterns of users is one of the challenges of current recommendation systems (Belém, Silva, Andrade, Person, & Gonalves, 2020). Dynamic preference patterns may change as time proceeds, and ignoring changes in user preferences and item characteristics can affect the accuracy of recommendations.
This paper constructs a novel enhanced latent semantic model based on users’ comments, and employs regularization factors to capture the temporal evolution characteristics of users’ potential topics for each commodity, so as to improve the accuracy of recommendation. The adaptive temporal weighting of multiple preference features is also improved to calculate the preferences of different users at different time periods using human forgetting features, item interest overlap, and similarity at the semantic level of the review text to improve the accuracy of sparse evaluation data. The paper conducts comparison experiments with six temporal matrix-based decomposition baseline methods in nine datasets, and the results show that the accuracy is 31.64% better than TimeSVD++, 21.08% better than BTMF, 15.51% better than TMRevCo, 13.99% better than BPTF, 9.24% better than TCMF, and 3.19% better than MUTPD ,which indicates that the model is more effective in capturing users’ temporal interest drift and better reflects the evolutionary relationship between users’ latent topics and item ratings.
A deep recommendation model of cross-grained sentiments of user reviews and ratings
2022, Information Processing and Management
Citation Excerpt :
With the growing amount of product information and review data on the Web, it is increasingly challenging for consumers to find products on ecommerce sites that meet their preferences. The recommender system, as an important technology to alleviate information overload, has been used in areas such as health (Chen et al., 2020), intelligent transportation (Younes & Boukerche, 2018), education (Cobos et al., 2013), and e-commerce (Belém et al., 2020). Especially in e-commerce, user preference-oriented product or service recommendations are likely to win favor (Zhu et al., 2020).
The matrix factorization model based on user-item rating data has been widely studied and applied in recommender systems. However, data sparsity, the cold-start problem, and poor explainability have restricted its performance. Textual reviews usually contain rich information about items’ features and users’ sentiments and preferences, which can solve the problem of insufficient information from only user ratings. However, most recommendation algorithms that take sentiment analysis of review texts into account are either fine- or coarse-grained, but not both, leading to uncertain accuracy and comprehensiveness regarding user preference. This study proposes a deep learning recommendation model (i.e., DeepCGSR) that integrates textual review sentiments and the rating matrix. DeepCGSR uses the review sets of users and items as a corpus to perform cross-grained sentiment analysis by combining fine- and coarse-grained levels to extract sentiment feature vectors for users and items. Deep learning technology is used to map between the extracted feature vector and latent factor through the rating-based matrix factorization model and obtain deep, nonlinear features to predict the user's rating of an item. Iterative experiments on e-commerce datasets from Amazon show that DeepCGSR consistently outperforms the recommendation models LFM, SVD++, DeepCoNN, TOPICMF, and NARRE. Overall, comparing with other recommendation models, the DeepCGSR model demonstrated improved evaluation results by 14.113% over LFM, 13.786% over SVD++, 9.920% over TOPICMF, 5.122% over DeepCoNN, and 2.765% over NARRE. Meanwhile, the DeepCGSR has great potential in fixing the overfitting and cold-start problems. Built upon previous studies and findings, the DeepCGSR is the state of the art, moving the design and development of the recommendation algorithms forward with improved recommendation accuracy.
An Effective, Efficient, and Scalable Confidence-Based Instance Selection Framework for Transformer-Based Text Classification
2023, SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
The Impact of Hygiene Factors on Online Hotel Consumption in China during the COVID-19 Pandemic
2023, Sustainability (Switzerland)

View all citing articles on Scopus

View full text

“Fixing the curse of the bad product descriptions” – Search-boosted tag recommendation for E-commerce products

Highlights

Abstract

Introduction

Section snippets

Related work

Problem statement

Tag recommendation methods

Experimental setup

Relevance of tags and queries (RQ1, RQ2)

Conclusions and future work

CRediT authorship contribution statement

Acknowledgments

Information Processing & Management

Information Processing and Management

Neurocomputing

Information & Management

Neurocomputing

Information Processing & Management

Unbiased learning to rank: Theory and practice

Proceedings of the 2018 ACM SIGIR international conference on theory of information retrieval

Modern information retrieval.

A survey on tag recommendation methods

Journal of the Association for Information Science and Technology

Image aesthetics and its effects on product clicks in e-commerce search

Proceedings of the SIGIR 2019 workshop on ecommerce, co-located with the 42st international ACM SIGIR conference on research and development in information retrieval, ecom@SIGIR 2019, Paris, France, July 25, 2019.

Beyond relevance: Explicitly promoting novelty and diversity in tag recommendation

ACM Transactions on Intelligent Systems and Technology

Latent dirichlet allocation

Journal of Machine Learning Research

From RankNet to LambdaRank to LambdaMART: An overview

Technical Report

Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDDinternational conference on knowledge discovery and data mining

A survey of automatic query expansion in information retrieval

ACM Computing Surveys

Inferring tag co-occurrence relationship across heterogeneous social networks

Applied Soft Computing

Understanding dropout: Training multi-layer perceptrons with auxiliary independent stochastic neurons

CoRR

Assessing the quality of textual features in social media

Information Processing & Management

Measuring nominal scale agreement among many raters

Learning to rank with extremely randomized trees

Journal of Machine Learning Research

Exploring feedback models in interactive tagging

International conference on web intelligence and intelligent agent technology

Relevant term suggestion in interactive web search based on contextual information in query session logs

Journal of the American Society for Information Science and Technology