Abstract
Click farming has become a common phenomenon, which brings great harm to the online shopping platform and consumers. To identify click farming on the Taobao platform, the largest online shopping platform in China, we use the positive-unlabeled learning method to find reliable negative instances from the unlabeled set and output the identification of click farming with probability rank for all shops, after creating several features from both goods and online shops. Then, a weighted logit model is used to investigate the role of extracted features in dissecting click farming. The empirical findings show that the extracted features are efficient to identify and explain click farming. And, the results show that click farming may not necessarily depend on the state of the shop. Our study can help online consumers to reduce the risk of being deceived, and help the platform to improve its regulatory capacity in click farming.
Similar content being viewed by others
References
Bahnsen, A. C., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142.
Barabesi, L., Cerasa, A., Perrotta, D., & Cerioli, A. (2016). Modeling international trade data with the Tweedie distribution for anti-fraud and policy support. European Journal of Operational Research, 248(3), 1031–1043.
Berrar, D. (2016). Learning from automatically labeled data: Case study on click fraud prediction. Knowledge and Information Systems, 46, 477–490.
de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Redondo-Expósito, L. (2018). Positive unlabeled learning for building recommender systems in a parliamentary setting. Information Sciences, 433, 221–232.
Carneiro, N., Figueira, G., & Costa, M. (2017). A data mining based system for credit-card fraud detection in e-tail. Decision Support Systems, 95, 91–101.
Carta, S., Fenu, G., Reforgiato, D., & Recupero, S. R. (2019). Fraud detection for e-commerce transactions by employing a prudential multiple consensus model. Journal of Information Security and Applications, 46, 13–22.
Chen, M., Jacob, V. S., Radhakrishnan, S., & Ryu, Y. U. (2015). Can payment-per-click induce improvements in click fraud identification technologies? Information Systems Research, 26(4), 754–772.
Chen, R., Zheng, Y., Weiand, X. M., & Liu, W. J. (2018). Secondhand seller reputation in online markets: A text analytics framework. Decision Support Systems, 108, 96–106.
Denis, F. (1998). PAC learning from positive statistical queries. In Proceedings of the 9th international conference on algorithmic learning theory (pp. 112–126). Berlin: Springer.
Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35(2), 461–487.
eMarketer. (2019). E-commerce share of total global retail sales from 2015 to 2023. Retrieved from https://www.statista.com/statistics/534123/e-commerce-share-ofretail-sales-worldwide/. Accessed 8 Apr 2020.
Haider, C. M. R., Iqbal, A., Rahman, A. H., & Rahman, M. S. (2018). An ensemble learning based approach for impression fraud detection in mobile advertising. Journal of Network and Computer Applications, 112, 126–141.
Hernández-González, J., In, I., & Lozano, J. A. (2017). Learning from proportions of positive and unlabeled examples. International Journal of Intelligent Systems, 32(2), 109–133.
Hou, J., Chi, M., Li, T., Guan, Z. H., Luo, K., & Zhang, D. X. (2019). Spreading dynamics of SVFR online fraud information model on heterogeneous networks. Physica A: Statistical Mechanics and its Applications, 534, 122026.
Jang, B., Jeong, S., & Ck, K. (2019). Distance-based customer detection in fake follower makets. Information Systems, 81, 104–116.
Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He-Guelton, L., et al. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245.
Khanna, V., Kim, E. H., & Lu, Y. (2015). CEO connectedness and corporate fraud. The Journal of Finance, 70(3), 1203–1252.
Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2018). Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35(1), 350–380.
Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2019). Detecting anomalous online reviewers: An unsupervised approach using mixture models. Journal of Management Information Systems, 36(4), 1313–1346.
Lan, W., Wang, J., Li, M., Liu, J., Li, Y., Wu, F. X., et al. (2016). Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, 50–57.
Lappas, T., Sabnis, G., & Valkanas, G. (2016). The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Information Systems Research, 27(4), 940–961.
Li, N., Du, S., Zheng, H., Xue, M., & Zhu, H. (2018). Fake reviews tell no tales? Dissecting click farming in content-generated social networks. China Communications, 15(4), 98–109.
Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. International Joint Conference on Artificial Intelligence, 3, 587–592.
Liu, B., Dai, Y., Li, X., Lee, W. S., & Philip, S. Y. (2003). Building text classifiers using positive and unlabeled examples. Citeseer, 3, 179–188.
Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2003). Partially supervised classification of text documents. International Conference on Machine Learning, 2, 387–394.
Liu, Q., Huang, S., & Zhang, L. (2016). The influence of information cascades on online purchase behaviors of search and experience products. Electronic Commerce Research, 16(4), 553–580.
Liu, Y., & Pang, B. (2018). A unified framework for detecting author spamicity by modeling review deviation. Expert Systems With Applications, 112, 148–155.
Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12), 3412–3427.
Noekhah, S., Nb, S., & Zakaria, N. H. (2020). Opinion spam detection: Using multi-iterative graph-based model. Information Processing & Management, 57(1), 102140.
Ren, K., Yang, H., Zhao, Y., Chen, W., Xue, M., Miao, H., et al. (2018). A robust AUC maximization framework with simultaneous outlier detection and feature selection for positive-unlabeled classification. IEEE Transactions on Neural Networks and Learning Systems, PP(99), 1–12.
Reyes-Menendez, A., Saura, J. R., & Filipe, F. (2019). The importance of behavioral data to identify online fake reviews for tourism businesses: A systematic review. PeerJ Computer Science, 5, e219.
Shihab, M. R., & Putri, A. P. (2019). Negative online reviews of popular products: Understanding the effects of review proportion and quality on consumers’ attitude and intention to buy. Electronic Commerce Research, 19(1), 159–187.
Tan, F. T. C., Guo, Z., Cahalane, M., & Cheng, D. (2016). Developing business analytic capabilities for combating e-commerce identity fraud: A study of trustev’s digital verification solution. Information & Management, 53(7), 878–891.
Thakur, S. (2019). A reputation management mechanism that incorporates accountability in online ratings. Electronic Commerce Research, 19(1), 23–57.
Tsang, S., Koh, Y. S., Dobbie, G., & Alam, S. (2014). Detecting online auction shilling frauds using supervised learning. Expert Systems with Applications, 41(6), 3027–3040.
Wessel, M., Thies, F., & Benlian, A. (2016). The emergence and effects of fake social information: Evidence from crowdfunding. Decision Support Systems, 90, 75–85.
Yang, P., Humphrey, S. J., James, D. E., Yang, Y. H., & Jothi, R. (2015). Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data. Bioinformatics, 32(2), 252–259.
Yang, P. Y., Ormerod, J. T., Liu, W., Ma, C. D., Zomaya, A. Y., & Yang, J. Y. H. (2019). Adasampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), 1932–1943.
Yu, C. H., & Lin, S. J. (2013). Fuzzy rule optimization for online auction frauds detection based on genetic algorithm. Electronic Commerce Research, 13(2), 169–182.
Yu, H., Han, J., & Chang, K. C. C. (2002). PEBL: Positive example based learning for web page classification using SVM. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239–248). ACM.
Zhang, C., Gupta, A., Kauten, C., Deokar, A. V., & Qin, X. (2019). Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(316), 1036–1052.
Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.
Zhang, F., Hao, X., Chao, J., & Yuan, S. (2020). Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowledge-Based Systems, 193, 105520.
Zhang, Y., Bian, J., & Zhu, W. (2013). Trust fraud: A crucial challenge for china’s e-commerce market, electronic commerce research and applications. Electronic Commerce Research and Applications, 12(5), 299–308.
Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2016). Extracting and reasoning about implicit behavioral evidences for detecting fraudulent online transactions in e-commerce. Decision Support Systems, 86, 109–121.
Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2018). What makes a helpful online review? A meta-analysis of review characteristics. Electronic Commerce Research, 19(2), 257–284.
Zhu, D., Lappas, T., & Zhang, J. (2018). Unsupervised tip-mining from customer reviews. Decision Support Systems, 107, 116–124.
Acknowledgements
The authors would like to thank the Editor-in-Chief, the Associate Editor, and the three anonymous referees for their helpful comments and constructive guidance. The authors gratefully acknowledge financial support from the National Natural Science Foundation of China (71671056, 91846201), the Humanity and Social Science Foundation of the Ministry of Education of China (19YJA790035), and the National Statistical Science Research Projects of China (2019LD05).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jiang, C., Zhu, J. & Xu, Q. Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression. Electron Commer Res 22, 157–176 (2022). https://doi.org/10.1007/s10660-020-09418-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10660-020-09418-z