Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression

Jiang, Cuixia; Zhu, Jun; Xu, Qifa

doi:10.1007/s10660-020-09418-z

Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression

Published: 02 June 2020

Volume 22, pages 157–176, (2022)
Cite this article

Electronic Commerce Research Aims and scope Submit manuscript

853 Accesses
6 Citations
Explore all metrics

Abstract

Click farming has become a common phenomenon, which brings great harm to the online shopping platform and consumers. To identify click farming on the Taobao platform, the largest online shopping platform in China, we use the positive-unlabeled learning method to find reliable negative instances from the unlabeled set and output the identification of click farming with probability rank for all shops, after creating several features from both goods and online shops. Then, a weighted logit model is used to investigate the role of extracted features in dissecting click farming. The empirical findings show that the extracted features are efficient to identify and explain click farming. And, the results show that click farming may not necessarily depend on the state of the shop. Our study can help online consumers to reduce the risk of being deceived, and help the platform to improve its regulatory capacity in click farming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from automatically labeled data: case study on click fraud prediction

Article 26 February 2015

Online-to-offline advertisements as field experiments

Article Open access 18 October 2021

Predicting Market Basket Additions as a Way to Enhance Customer Service Levels

References

Bahnsen, A. C., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142.
Google Scholar
Barabesi, L., Cerasa, A., Perrotta, D., & Cerioli, A. (2016). Modeling international trade data with the Tweedie distribution for anti-fraud and policy support. European Journal of Operational Research, 248(3), 1031–1043.
Google Scholar
Berrar, D. (2016). Learning from automatically labeled data: Case study on click fraud prediction. Knowledge and Information Systems, 46, 477–490.
Google Scholar
de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., & Redondo-Expósito, L. (2018). Positive unlabeled learning for building recommender systems in a parliamentary setting. Information Sciences, 433, 221–232.
Google Scholar
Carneiro, N., Figueira, G., & Costa, M. (2017). A data mining based system for credit-card fraud detection in e-tail. Decision Support Systems, 95, 91–101.
Google Scholar
Carta, S., Fenu, G., Reforgiato, D., & Recupero, S. R. (2019). Fraud detection for e-commerce transactions by employing a prudential multiple consensus model. Journal of Information Security and Applications, 46, 13–22.
Google Scholar
Chen, M., Jacob, V. S., Radhakrishnan, S., & Ryu, Y. U. (2015). Can payment-per-click induce improvements in click fraud identification technologies? Information Systems Research, 26(4), 754–772.
Google Scholar
Chen, R., Zheng, Y., Weiand, X. M., & Liu, W. J. (2018). Secondhand seller reputation in online markets: A text analytics framework. Decision Support Systems, 108, 96–106.
Google Scholar
Denis, F. (1998). PAC learning from positive statistical queries. In Proceedings of the 9th international conference on algorithmic learning theory (pp. 112–126). Berlin: Springer.
Dong, W., Liao, S., & Zhang, Z. (2018). Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35(2), 461–487.
Google Scholar
eMarketer. (2019). E-commerce share of total global retail sales from 2015 to 2023. Retrieved from https://www.statista.com/statistics/534123/e-commerce-share-ofretail-sales-worldwide/. Accessed 8 Apr 2020.
Haider, C. M. R., Iqbal, A., Rahman, A. H., & Rahman, M. S. (2018). An ensemble learning based approach for impression fraud detection in mobile advertising. Journal of Network and Computer Applications, 112, 126–141.
Google Scholar
Hernández-González, J., In, I., & Lozano, J. A. (2017). Learning from proportions of positive and unlabeled examples. International Journal of Intelligent Systems, 32(2), 109–133.
Google Scholar
Hou, J., Chi, M., Li, T., Guan, Z. H., Luo, K., & Zhang, D. X. (2019). Spreading dynamics of SVFR online fraud information model on heterogeneous networks. Physica A: Statistical Mechanics and its Applications, 534, 122026.
Google Scholar
Jang, B., Jeong, S., & Ck, K. (2019). Distance-based customer detection in fake follower makets. Information Systems, 81, 104–116.
Google Scholar
Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He-Guelton, L., et al. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245.
Google Scholar
Khanna, V., Kim, E. H., & Lu, Y. (2015). CEO connectedness and corporate fraud. The Journal of Finance, 70(3), 1203–1252.
Google Scholar
Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2018). Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35(1), 350–380.
Google Scholar
Kumar, N., Venugopal, D., Qiu, L., & Kumar, S. (2019). Detecting anomalous online reviewers: An unsupervised approach using mixture models. Journal of Management Information Systems, 36(4), 1313–1346.
Google Scholar
Lan, W., Wang, J., Li, M., Liu, J., Li, Y., Wu, F. X., et al. (2016). Predicting drug-target interaction using positive-unlabeled learning. Neurocomputing, 206, 50–57.
Google Scholar
Lappas, T., Sabnis, G., & Valkanas, G. (2016). The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Information Systems Research, 27(4), 940–961.
Google Scholar
Li, N., Du, S., Zheng, H., Xue, M., & Zhu, H. (2018). Fake reviews tell no tales? Dissecting click farming in content-generated social networks. China Communications, 15(4), 98–109.
Google Scholar
Li, X., & Liu, B. (2003). Learning to classify texts using positive and unlabeled data. International Joint Conference on Artificial Intelligence, 3, 587–592.
Google Scholar
Liu, B., Dai, Y., Li, X., Lee, W. S., & Philip, S. Y. (2003). Building text classifiers using positive and unlabeled examples. Citeseer, 3, 179–188.
Google Scholar
Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2003). Partially supervised classification of text documents. International Conference on Machine Learning, 2, 387–394.
Google Scholar
Liu, Q., Huang, S., & Zhang, L. (2016). The influence of information cascades on online purchase behaviors of search and experience products. Electronic Commerce Research, 16(4), 553–580.
Google Scholar
Liu, Y., & Pang, B. (2018). A unified framework for detecting author spamicity by modeling review deviation. Expert Systems With Applications, 112, 148–155.
Google Scholar
Luca, M., & Zervas, G. (2016). Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62(12), 3412–3427.
Google Scholar
Noekhah, S., Nb, S., & Zakaria, N. H. (2020). Opinion spam detection: Using multi-iterative graph-based model. Information Processing & Management, 57(1), 102140.
Google Scholar
Ren, K., Yang, H., Zhao, Y., Chen, W., Xue, M., Miao, H., et al. (2018). A robust AUC maximization framework with simultaneous outlier detection and feature selection for positive-unlabeled classification. IEEE Transactions on Neural Networks and Learning Systems, PP(99), 1–12.
Google Scholar
Reyes-Menendez, A., Saura, J. R., & Filipe, F. (2019). The importance of behavioral data to identify online fake reviews for tourism businesses: A systematic review. PeerJ Computer Science, 5, e219.
Google Scholar
Shihab, M. R., & Putri, A. P. (2019). Negative online reviews of popular products: Understanding the effects of review proportion and quality on consumers’ attitude and intention to buy. Electronic Commerce Research, 19(1), 159–187.
Google Scholar
Tan, F. T. C., Guo, Z., Cahalane, M., & Cheng, D. (2016). Developing business analytic capabilities for combating e-commerce identity fraud: A study of trustev’s digital verification solution. Information & Management, 53(7), 878–891.
Google Scholar
Thakur, S. (2019). A reputation management mechanism that incorporates accountability in online ratings. Electronic Commerce Research, 19(1), 23–57.
Google Scholar
Tsang, S., Koh, Y. S., Dobbie, G., & Alam, S. (2014). Detecting online auction shilling frauds using supervised learning. Expert Systems with Applications, 41(6), 3027–3040.
Google Scholar
Wessel, M., Thies, F., & Benlian, A. (2016). The emergence and effects of fake social information: Evidence from crowdfunding. Decision Support Systems, 90, 75–85.
Google Scholar
Yang, P., Humphrey, S. J., James, D. E., Yang, Y. H., & Jothi, R. (2015). Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data. Bioinformatics, 32(2), 252–259.
Google Scholar
Yang, P. Y., Ormerod, J. T., Liu, W., Ma, C. D., Zomaya, A. Y., & Yang, J. Y. H. (2019). Adasampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Transactions on Cybernetics, 49(5), 1932–1943.
Google Scholar
Yu, C. H., & Lin, S. J. (2013). Fuzzy rule optimization for online auction frauds detection based on genetic algorithm. Electronic Commerce Research, 13(2), 169–182.
Google Scholar
Yu, H., Han, J., & Chang, K. C. C. (2002). PEBL: Positive example based learning for web page classification using SVM. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 239–248). ACM.
Zhang, C., Gupta, A., Kauten, C., Deokar, A. V., & Qin, X. (2019). Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(316), 1036–1052.
Google Scholar
Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews. Journal of Management Information Systems, 33(2), 456–481.
Google Scholar
Zhang, F., Hao, X., Chao, J., & Yuan, S. (2020). Label propagation-based approach for detecting review spammer groups on e-commerce websites. Knowledge-Based Systems, 193, 105520.
Google Scholar
Zhang, Y., Bian, J., & Zhu, W. (2013). Trust fraud: A crucial challenge for china’s e-commerce market, electronic commerce research and applications. Electronic Commerce Research and Applications, 12(5), 299–308.
Google Scholar
Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2016). Extracting and reasoning about implicit behavioral evidences for detecting fraudulent online transactions in e-commerce. Decision Support Systems, 86, 109–121.
Google Scholar
Zhao, J., Lau, R. Y. K., Zhang, W., Zhang, K., Chen, X., & Tang, D. (2018). What makes a helpful online review? A meta-analysis of review characteristics. Electronic Commerce Research, 19(2), 257–284.
Google Scholar
Zhu, D., Lappas, T., & Zhang, J. (2018). Unsupervised tip-mining from customer reviews. Decision Support Systems, 107, 116–124.
Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief, the Associate Editor, and the three anonymous referees for their helpful comments and constructive guidance. The authors gratefully acknowledge financial support from the National Natural Science Foundation of China (71671056, 91846201), the Humanity and Social Science Foundation of the Ministry of Education of China (19YJA790035), and the National Statistical Science Research Projects of China (2019LD05).

Author information

Authors and Affiliations

School of Management, Hefei University of Technology, Hefei, 230009, Anhui, China
Cuixia Jiang, Jun Zhu & Qifa Xu

Authors

Cuixia Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qifa Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qifa Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, C., Zhu, J. & Xu, Q. Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression. Electron Commer Res 22, 157–176 (2022). https://doi.org/10.1007/s10660-020-09418-z

Download citation

Published: 02 June 2020
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10660-020-09418-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression

Abstract

Access this article

Similar content being viewed by others

Learning from automatically labeled data: case study on click fraud prediction

Online-to-offline advertisements as field experiments

Predicting Market Basket Additions as a Way to Enhance Customer Service Levels

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dissecting click farming on the Taobao platform in China via PU learning and weighted logistic regression

Abstract

Access this article

Similar content being viewed by others

Learning from automatically labeled data: case study on click fraud prediction

Online-to-offline advertisements as field experiments

Predicting Market Basket Additions as a Way to Enhance Customer Service Levels

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation