Identifying comparable entities with indirectly associative relations and word embeddings from web search logs

doi:10.1016/j.dss.2020.113465

Decision Support Systems

Volume 141, February 2021, 113465

https://doi.org/10.1016/j.dss.2020.113465 Get rights and content

Highlights

•
A new perspective of comparable entity identification in terms of indirectly associative relation analysis is proposed.
•
A method named ICE is proposed to seek comparable entities for a given focal entity.
•
Data experiments prove the outperformance of ICE in terms of identification accuracy and broadnessand appropriate ranking.

Abstract

Comparable entity identification plays an essential role in the decision making of both consumers and firms in competitive environment. In contrast to traditional cooccurrence approaches, this paper proposes a novel method, namely, ICE (identifying comparable entities) for effectively identifying comparable entities from web search logs, which are online user-generated contents that reflect users' attention and preferences. ICE consists of two stages: the formulation of directly and indirectly associative relations, followed by a generative procedure that is designed for deriving a broad set of candidate entities that are indirectly associative with a specified focal entity; and a deep-learning-based semantic analysis with a word embedding procedure for measuring the similarities between entity profiles so as to target comparable entities from the candidate set. Extensive experiments show that ICE outperforms several baseline methods in the identification of accurate, broad and novel comparable entities with suitable rankings.

Introduction

The comparison of products or services plays a significant role in consumers' purchasing decision-making process, where they often resort to webpages, online reviews, and/or social media to obtain information regarding comparable entities. However, due to the bounded cognitive ability [22] and the information overload [29], consumers cannot effectively access the entire set of comparable entities in an effective manner. More importantly, such a comparison typically requires high-level domain knowledge of consumers. Thus, comparable entity identification, which aims at identifying a comprehensive and accurate set of entities that are comparable to a specified entity, is deemed desirable for helping consumers identify alternative products for consideration in their decision-making process [13].

Comparable entity identification is also vital for businesses in strategic and marketing management. Typically, managers can identify comparable entities by matching similarities and differences of entity categories and characteristics in their minds [22]. Due to limited cognitive abilities, they may only be aware of a small number of comparable entities, and entities that are out of sight will not be considered [3]. Although firms sometimes utilize paid profile services such as Hoovers (www.hoovers.com) and Mergent (www.mergentonline.com) to collect information regarding comparable entities, those services are provided by professionals for specified domains; thus, they may be costly and often suffer from scalability problems [32]. Moreover, such professional-based services cannot reach consumers' minds and fail to examine comparable entities from the perspective of users.

To overcome these limitations, some recent efforts have been made to automatically identify comparable entities or mine comparative relations from online user-generated contents (UGC) [1,29]. For comparable entity identification from the user perspective, extant methods are mainly conditioned on the premise that comparable entities have much higher cooccurrence in the same statements. However, this premise cannot be well applied in various types of UGC, such as web search logs, online product reviews, and tweets, where comparable entities appear less frequently in cooccurrence patterns [30], thereby leading to degraded performance.

To extend the premise, this study proposes a new perspective of comparable entity identification in terms of indirectly associative relation analysis. In various types of UGC, comparable entities not only directly appear in the same statements but also appear in an indirect form. The proposed indirectly associative relation analysis is a useful extension of previous efforts. It can improve consumers' and managers' exploration of various types of UGC and help them capture comparable entities that are ignored by extant methods. In consideration of indirectly associative relations, a typical type of UGC, namely, web search logs, is selected as the research data of this study. In web search logs, comparable entities seldom appear in cooccurrence patterns [30]. This type of UGC demands for novel methods for identifying indirectly associative relations, which deviate from the traditional cooccurrence premise.

In this study, a method, namely, ICE (identifying comparable entities) is proposed for identifying entities that are comparable to a specified entity from web search logs. Entities in ICE refer to the objects (e.g., companies, products, and persons) that users care about and then query through search engines [1,23]. Comparable entities, such as BMW and Mercedes-Benz, are entities that share a common utility and meet the similar needs of consumers [29]. In ICE, the specified entity for which comparable entities must be identified by the method is called a focal entity [1,23]. For example, Ford is selected as a focal entity for managers in Ford Motor Company, and Ausnutria will be selected as a focal entity for consumers who want to buy milk powder. Two key issues must be addressed. First, as previously discussed, most comparable entities do not frequently appear concurrently in the same web search logs [30]. Second, due to data noise and short queries in web search logs [6], the accuracy of the identification results depends not only on the cooccurrence positions where entities appear. It is necessary to incorporate an effective semantic analysis between entities into the identification process. Therefore, ICE consists of two stages: the derivation of a broad set of candidate entities that are indirectly associative with the focal entity, which are linked by their related aspects, and the measurement of the similarities between the candidate entities and the focal entity to target comparable entities from the obtained candidate set. Data experiments are conducted to evaluate the performance of ICE in comparison to several baseline methods.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 introduces the proposed method, whose algorithmic details are presented in Section 4. The experimental results are provided in Section 5. Finally, the work is concluded in Section 6.

Section snippets

Related work

Since the focus of this study is for identifying comparable entities, the mainstream studies of comparable entity identification are reviewed in Section 2.1. The second stage of ICE is to detect comparable entities from a set of candidate entities identified in the first stage, which is relevant to comparative relation mining. Thus, similar methods of comparative relation mining are reviewed in Section 2.2.

The ICE method

In this study, a two-stage method ICE is proposed for identifying comparable entities based on indirectly associative relations from the perspective of consumers. ICE is composed of two stages: the first stage is the discovery of a broad set of candidate entities based on indirectly associative relations of keywords in web search logs, and the second stage is a semantic analysis that is implemented based on keyword and document representations for the detection of comparable entities from the

The algorithm

In this section, the algorithmic details and time complexity of ICE are analyzed to show the main factors affecting its computation time.

Algorithm 1 provides the pseudo-code of ICE. In the first stage, HashMap data structure is adopted to map the directly associative relation by traversing over the web search log sets Q once. Supposing there are totally NQ queries in the web search log set Q, the time complexity for generating candidate entities is O(NQ) (lines 3–5), which means the time

Data experiments

This section aims to test whether ICE can identify accurate comparable entities and effectively cover the entities that users might compare. In addition, the effectiveness of ICE in comparable entity ranking is also validated.

Conclusions

Comparable entity identification is desirable for both consumers and managers in their decision-making processes. In this paper, a novel two-stage ICE method for comparable entity identification that is based on web search logs has been proposed from the perspective of consumers. In the first stage, a candidate entity generation process has been designed based on the indirectly associative relation of comparable entities that are linked by their shared related aspect information. Furthermore,

Acknowledgements

The work was partly supported by the National Natural Science Foundation of China (71772177, 72072177), the MOE Project of Key Research Institute of Humanities and Social Sciences at Universities (17JJD630006), and the joint PhD scholarship of Renmin Business School.

Liye Wang is currently pursuing her PhD degree in the Department of Management Science and Engineering, School of Business, Renmin University of China. Her research interests include competitive intelligence, e-commerce and text mining. Her work has been published in the journal of Frontiers of Business Research in China.

References (33)

M. Kraus et al.
Decision support from financial disclosures with deep neural networks and transfer learning
Decis. Support. Syst.
(2017)
Y. Liu et al.
Assessing product competitive advantages from the perspective of customers by mining user-generated content on social media
Decis. Support. Syst.
(2019)
S.L. Lo et al.
Ranking of high-value social audiences on twitter
Decis. Support. Syst.
(2016)
Z. Ma et al.
Mining competitor relationships from online news: a network-based approach
Electron. Commer. Res. Appl.
(2011)
D. Qiao et al.
Finding competitive keywords from query logs to enhance search engine advertising
Inf. Manag.
(2017)
K. Xu et al.
Mining comparative opinions from customer reviews for competitive intelligence
Decis. Support. Syst.
(2011)
S. Bao et al.
Competitor mining with the web
IEEE Trans. Knowl. Data Eng.
(2008)
J. Choi et al.
A novel method for identifying competitors using a financial transaction network
IEEE Trans. Eng. Manag.
(2019)
B.H. Clark et al.
Managerial identification of competitors
J. Mark.
(1999)
S. Gregor et al.
Positioning and presenting design science research for maximum impact
MIS Q.
(2013)

A. Jain et al.

How do they compare? Automatic identification of comparable entities on the web

Z. Jiang et al.

Learning open-domain comparable entity graphs from user search queries

N. Jindal et al.

Mining comparative sentences and relations

N. Lathia et al.

Temporal diversity in recommender systems

Q. Le et al.

Distributed representations of sentences and documents

T.Y. Lee et al.

Automated marketing research using online customer reviews

J. Mark. Res.

(2011)

Cited by (9)

Why some products compete and others don't: A competitive attribution model from customer perspective
2023, Decision Support Systems
Competitive intelligence uses information collected about competitors to derive better managerial insights. In this study, we focus on identifying the competitors and detecting the competitive dimensions concurrently. To achieve this goal, we propose an aspect-level competitive attribution model (a variation of the topic model) to leverage consumer-reviewed products and their review texts. To better analyze product relations and the underlying competitive aspects, we consider consumer limited attention when modeling consumers' preferences and introduce a background aspect to filter out the trivial and maintain the valuable competition-related information in review texts. We validate this approach using a dataset of 785 products reviewed by 15,669 consumers in the auto industry. Based on the empirical experiments, we show that our model can accurately infer high-quality competitive segments and decipher competition-related aspects corresponding to these segments. To highlight differences, we conduct comparisons and find our approach outperforms the benchmark models meaningfully in the literature when predicting consumers' online behaviors.
A novel textual data augmentation method for identifying comparative text from user-generated content
2022, Electronic Commerce Research and Applications
Citation Excerpt :
Utilizing UGC on e-commerce platforms and social media for gaining comparative intelligence has attracted great attention in recent years. Prior related studies focused on two research directions: identifying comparative text (Ngo Xuan et al., 2015; Zhang et al., 2018), and mining comparative relations (Bi et al., 2019; Liu et al., 2019; Liu et al., 2020a; Liu et al., 2020b; Wang et al., 2020a), including competitors identifications (Liu et al., 2020a; Wang et al., 2020a) and competitive advantage analysis (Liu et al., 2019; Liu et al., 2020b). In this Section, we emphasize on reviewing research efforts on comparative text identification.
Mining user-generated content on e-commerce platforms and social media is timely and more objective compared with other information access channels for gaining competitive intelligence. Identifying comparative text from large volumes of non-comparative text is an important but challenging task. On one hand, existing methods are time-consuming and not generalizable across different domains. On the other hand, the datasets for the task generally suffer from the severe imbalance issue. To address abovementioned problems, we propose a framework adopting advanced deep learning methods to automatically learn features and a novel textual data augmentation method named TA3S to deal with the data imbalance issue. Specifically, the TA3S method simultaneously considers the syntactic structure and semantic information of comparative text samples. Moreover, in order to support the successful implementation of TA3S, we develop a novel method based on word embedding and label propagation algorithm to distinguish between synonymous and antonymous substitute words. The experiments on two real-world datasets demonstrate the feasibility and effectiveness of our framework, and present that our framework outperforms state-of-the-art methods in identifying comparative text from user-generated content.
The moderating effects of entertainers on public engagement through government activities in social media during the COVID-19
2022, Telematics and Informatics
Citation Excerpt :
To achieve this goal, we assessed whether a given OOI posted by a government user on social media, which was used to promote a specific activity that was participated in by entertainers, had a greater degree of diffusion. This variable was based on the word segmentation results of each collected post using the R package Jieba, which has shown effective performance in processing the contents of Chinese social media data in many previous studies (Chen and Chen, 2019; Wang et al., 2021). In particular, for word segmentation, we used a lexicon containing 5,980 names of entertainers, which was provided by Sougou Pinyin.
Following the onset of COVID-19 pandemic, increasing the degree of public engagement is a crucial task for governments. This study investigated the moderating effects of entertainers on public engagement through government activities during COVID-19 pandemic. The government activities were publicized through the government’s social media posts. The results showed a significant positive relationship between participation of entertainers and degree of public engagement through social media. Our findings indicated that inviting entertainers to participate could improve the effects of publicity and ease the emotional tension and anxiety among the public during a crisis. However, more attention should be paid to the choice of entertainers to maintain the legality and seriousness of government activities. These findings could help the government effectively communicate prevention policies and disseminate crisis information to the public, regardless of where they were physically located.
A hybrid similarity measure-based clustering approach for mixed attribute data
2024, International Journal of Machine Learning and Cybernetics
Impact of word embedding models on text analytics in deep learning environment: a review
2023, Artificial Intelligence Review
A review for comparative text mining: From data acquisition to practical application
2023, Journal of Information Science

View all citing articles on Scopus

Jin Zhang is an associate professor in the Department of Management Science and Engineering, School of Business, Renmin University of China. He received his PhD degree in the Department of Management Science and Engineering from the School of Economics and Management at Tsinghua University. His current research interests include data mining, business intelligence, and web search. His work has been published in journals such as MIS Quarterly, INFORMS Journal on Computing, Decision Support Systems, Information & Management, and IEEE Transactions on Neural Network and Learning Systems, etc.

Guoqing Chen received his PhD from the Catholic University of Leuven (K.U. Leuven, Belgium) and now is Professor of Information Systems at the School of Economics and Management, Tsinghua University, Beijing, China. His research interests include information systems management, business analytics and decision support systems. His work has been published in journals such as MIS Quarterly, Journal of Management Information Systems, Journal of Association for Information Systems, Decision Sciences, INFORMS Journal on Computing, Decision Support Systems, ACM Transactions on Knowledge Discovery from Data, etc.

Dandan Qiao is an assistant professor in the Department of Information Systems and Analytics at the National University of Singapore (NUS). Prior to joining NUS, She earned her Ph.D in the Department of Information Systems from Tsinghua University. Her current research interests lie in the intersection of information systems, behavioural science, and data mining. Her work has been published in journals such as MIS Quarterly, Information Systems Research, Information & Management, ACM Transactions on Knowledge Discovery from Data, etc.

View full text

Identifying comparable entities with indirectly associative relations and word embeddings from web search logs

Highlights

Abstract

Introduction

Section snippets

Related work

The ICE method

The algorithm

Data experiments

Conclusions

Acknowledgements

Decis. Support. Syst.

Decis. Support. Syst.

Decis. Support. Syst.

Electron. Commer. Res. Appl.

Inf. Manag.

Decis. Support. Syst.

Competitor mining with the web

IEEE Trans. Knowl. Data Eng.

A novel method for identifying competitors using a financial transaction network

IEEE Trans. Eng. Manag.

Managerial identification of competitors

J. Mark.

Positioning and presenting design science research for maximum impact

MIS Q.

How do they compare? Automatic identification of comparable entities on the web

Learning open-domain comparable entity graphs from user search queries

Mining comparative sentences and relations

Temporal diversity in recommender systems

Distributed representations of sentences and documents

Automated marketing research using online customer reviews

J. Mark. Res.