Abstract
As is the case with many social media websites, the Community Question Answering (CQA) portal has become a target for spammers to disseminate promotion information. Previous works mainly focus on identifying low-quality answers or detecting spam information in question-answer (QA) pairs. However, these works suffer from long delay since they all rely on the information of answers or answerers while questions have been displayed on the websites for some time and attracted certain user traffic. As a matter of fact, spammers on CQA platforms also act as questioners and involve promotion information in their questions. So if they can be detected as early as possible, the questions will not appear on the websites and affect legitimate users. In this paper, we design a framework for early detection of promotion campaigns in CQA based on only question information and questioner profile. First, we propose a novel sampling method for identifying the questions that contain promotion information, which compose the positive dataset. We also sample an unlabeled dataset of unsolved questions during a certain period of time. Then, we compare the characteristics of question information and user profiles between the two datasets, which are also used as features in the learning process. Finally, we apply and compare several PU (Positive and Unlabeled examples) learning algorithms to find positive examples in the unlabeled dataset. In our approach, no answer side information is needed, which means that it can detect spamming activities as soon as the question is posted. Experimental results based on about 0.7 million questions derived from a popular Chinese CQA portal indicate that our approach can detect questions related to promotion campaigns as effectively as but more efficiently than the state-of-the-art QA pair level detection methods.
This work was supported by Natural Science Foundation (61672311, 61622208, 61532011, 61472206) of China and National Key Basic Research Program (2015CB358700).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: identifying commercial campagins in the CQA forums. JCST 30(4), 810–828 (2015)
Chen, C., Wu, K., Srinivasan, V., Bharadwaj, R.K.: The best answers? Think twice: online detection of commercial campaigns in the CQA forums. In: ASONAM, pp. 458–465 (2013)
Chen, Y.-R., Chen, H.-H.: Opinion spam detection in web forum: a real case study. In: WWW, pp. 173–183 (2015)
Ding, Z., Gong, Y., Zhou, Y., Zhang, Q., Huang, X.: Detecting spammers in community question answering. In: IJCNLP, pp. 118–126 (2013)
Fayazi, A., Lee, K., Caverlee, J., Squicciarini, A.: Uncovering crowdsourced manipulation of online reviews. In: SIGIR, pp. 233–242 (2015)
Harper, F.M., Raban, D., Rafaeli, S., Konstan, J.A.: Predictors of answer quality in online Q&A sites. In: SIGCHI, pp. 865–874 (2008)
Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with non-textual features. In: SIGIR, pp. 228–235 (2006)
Jiang, F., Liu, Y., Luan, H., Sun, J., Zhu, X., Zhang, M., Ma, S.: Microblog sentiment analysis with emoticon space model. JCST 30(5), 1120–1129 (2015)
Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: WWW, pp. 775–782 (2012)
Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: IJCAI, pp. 2348–2354 (2015)
Liu, B., Dai, Y., Li, X., Lee, W.S., Yu, P.S.: Building text classifiers using positive and unlabeled examples. In: ICDM, pp. 179–186 (2003)
Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L.: Identifying web spam with the wisdom of the crowds. TWEB 6(1), 1–30 (2012)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2002)
Shah, C., Pomerantz, J.: Evaluating and predicting answer quality in community QA. In: SIGIR, pp. 411–418 (2010)
Suryanto, M.A., Lim, E.P., Sun, A., Chiang, R.H.: Quality-aware collaborative question answering: methods and evaluation. In: WSDM, pp. 142–151 (2009)
Tian, T., Zhu, J., Xia, F., Zhuang, X., Zhang, T.: Crowd fraud detection in internet advertising. In: WWW, pp. 1100–1110 (2015)
Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf: crowdturfing for fun and profit. In: WWW, pp. 679–688 (2012)
Xu, H., Liu, D., Wang, H., Stavrou, A.: E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In: WWW, pp. 1296–1306 (2015)
Yu, H., Han, J., Chang, K.C.-C.: PEBL: positive example based learning for web page classification using SVM. In: SIGKDD, pp. 239–248 (2002)
Zafarani, R., Liu, H.: 10 bits of surprise: detecting malicious users with minimum information. In: CIKM, pp. 423–431 (2015)
Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M.: Question retrieval with high quality answers in community question answering. In: CIKM, pp. 371–380 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, X., Liu, Y., Zhang, M., Ma, S. (2016). Early Detection of Promotion Campaigns in Community Question Answering. In: Li, Y., Xiang, G., Lin, H., Wang, M. (eds) Social Media Processing. SMP 2016. Communications in Computer and Information Science, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-2993-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-2993-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2992-9
Online ISBN: 978-981-10-2993-6
eBook Packages: Computer ScienceComputer Science (R0)