A novel framework for internet of knowledge protection in social networking services
Introduction
In the knowledge economy, knowledge is a primary resource. The internet is considered a power of its connectivity. It is used in the internet of things (IoT) to connect passive objects to exhibit their smart features. Similarly, the internet of knowledge is described as the connection of knowledge points originally spread over diverse places to represent them with high value. With the enhancement of the internet of knowledge, the knowledge is progressively found at the network level, such as SNSs. Recently, SNSs have been converted into an essential and prominent medium of communication and sharing knowledge. Typically, SNS users are liable for sharing knowledge in the network and are basic key elements in the network structure. The communities that consist of families, group of friends, and acquaintances are the next basic element in the network structure. In SNSs, users can share knowledge by posting links to their favorite webpages, files, photos, and videos. Furthermore, the structure of SNS communities generates a network of credibility and trust [1,2].
Facebook and Twitter are leading SNSs. According to a report from Statista [3], the total number of Facebook and Twitter users stood at 1968 million and 319 million, respectively, as of April 2017. With the escalating number of users, a huge amount of heterogeneous knowledge is also being produced every day on these two SNSs [4]. Mainly, multimedia knowledge (in the form of text, audio files, videos, and images) is produced, stored, and transferred in a huge amount. The multimedia knowledge posted on these SNSs is mostly accompanied by user likes, comments, tags, hashtags, and so on. Multimedia has become an essential part of SNSs. According to Zephoria Digital Marketing’s report [5], approximately 136,000 photos are shared, 293,000 statuses are updated, and 510,000 comments are posted every 60 s on Facebook. This report also reveals that the average content sharing rate on Facebook was 4.75 billion pieces of content per day as of May 2013, which was 94% more than the content sharing rate as of August 2012. The Statista report [6] states that Facebook and Twitter play a significant role in global content sharing activities, and that 57% and 18% of social content sharing activities occurred via Facebook and Twitter, respectively, as of the second quarter of 2016. The statistics above and our survey on SNS security [7] demonstrate that the amount of multimedia knowledge shared on Facebook and Twitter is increasing every single day. With this increase in multimedia, however, these two SNSs have become the desired target of spammers for spreading spam. A statistic from Nexgate’s report showed an estimated 355% escalation in social media spam during the first half of 2013 [8].
Spammers use numerous techniques for posting spam messages on these SNSs. The posted spam messages act as marketing advertisements and scams and help transmit malware via enclosed malicious URLs, or they are used to perform phishing attacks [9]. In addition to this, spammers can follow unknown users and send unwanted messages containing malicious URLs to obtain high exposure. The masquerade URLs can spread threats in the form of drive-by-download (install malware) and infect the host machine, allowing the installed malware to tap into the host’s confidential information [10]. Furthermore, the infected host machine may also participate in wicked botnet activities, such as operating during Distributed Denial of Service (DDOS) attacks, or become a source of email spam. Spammers may use embedded links to organize a phishing attack, wherein they target a legitimate user to collect his or her confidential information. Usually, spammers on Facebook and Twitter pose as legitimate users. Therefore, identifying and differentiating them from legitimate users make for a difficult task. In the past, spammers on these SNSs were typically simple, with a clear appearance that helped in differentiating them from legitimate users. Nevertheless, spammers can still use cheap automated methods for gaining credibility and trust and making themselves difficult to detect in the large population of SNS users.
Spammer detection in SNSs is a classification problem wherein legitimate users are distinguished from spammers based on their respective features. Note, however, that these classification problems are associated with various significant challenges when attempting to detect spammers. These challenges include the high dimensionality of features, biased and limited training sets, high computational classification complexities, and public unavailability of training sets. To overcome these challenges and carry out spammer classification effectively, several traditional machine-learning methods, such as Support Vector Machine [11,12], Decision Tree [13], Jrip [14], Bayesian Network [15], Random Forest [16], k-Nearest Neighbors [17], etc. have been proposed. Due to the high dimensionality of features, however, finding the optimum parameters for parametric supervised algorithms is very time-consuming and difficult [18]. Therefore, existing models suffer from numerous issues, such as poor generalization performance, longer training time, and higher false positive rates. Moreover, many approaches, such as [19], use biased dataset containing a much smaller number of spam profiles than legitimate ones and provide inaccurate classification results.
Recently, the Extreme Learning Machine (ELM) [20] has been introduced as one of the effective machine learning classifiers. It is considered to be an effective and promising classifier over the other traditional classifiers, such as the Support Vector Machine and Naïve Bayesian, due to the following reasons: 1) it has better computational efficiency; 2) it provides similar or better generalization performance compared with the traditional machine learning algorithms; 3) it does not require adjusting any additional parameters except the predefined network architecture; and 4) it can use all piecewise continuous functions as activation functions, such as radial basis functions, triangular basis functions, sigmoid functions, etc. A significant number of studies from the academe show the competences of ELM in accurate classification at high learning speed [[21], [22], [23]]. Nevertheless, ELM has certain limitations. For instance, the arbitrary selection of input bias and weights in ELM can create the problem of ill-posed wherein the classification returns more than one solution that further results in lower generalization performance [24]. Note, however, that spammer detection in SNSs requires better generalization performance and shorter training time. In particular, shorter training time can be obtained by ELM [18], and generalization performance can be enhanced using ensemble learning [24], wherein a strong learner can be created by combining multiple weak learners [25]. It has been used with many weak learners, such as Decision Tree and Neural Networks, to create a strong learner, and it gives better generalization performance. Thus, similar to other weak learners, an ELM can be used as a weak learner, and its limitations can be overcome by employing the strategy of ensemble learning wherein a strong ELM can be created by combining multiple ELMs.
In this paper, we propose a Bagging ELM-based spammer detection framework wherein multiple ELMs are employed to distinguish spammers from legitimate users on SNSs. Bagging [26] is used as an ensemble learning method to combine multiple ELMs. The main contributions of this paper are as follows:
- •
We analyze spam activities on the two most popular SNSs of Facebook and Twitter and propose novel feature sets to facilitate spammer detection for both SNSs. Compared with other existing feature sets [11,14], our proposed feature sets consist of recent and selective features that are responsible for spam on both SNSs.
- •
The novelty of this paper primarily lies in the fact that it offers a framework that uses a new Bagging ELM approach to detecting SNS spammers. This method provides higher generalization performance for spammer detection at shorter training time.
- •
Since Facebook and Twitter datasets are not publicly available, we constructed a labeled dataset of both SNSs to evaluate the performance of our framework.
- •
We also provide a comparative analysis of performance of our framework with other existing frameworks in order to validate the effectiveness of our framework in detecting SNS spammers.
The rest of this paper is organized as follows: Section 2 discusses various existing techniques to mitigate the issue of spam in SNSs and the ELM approach for classification. Section 3 describes our proposed framework and its components, including the features set, dataset construction, and bagging ELM. Section 4 provides an experimental evaluation of our proposed framework and our comparison of it with other existing techniques for detecting SNS spammers. Finally, we conclude our paper in Section 5.
Section snippets
Related work
In this section, we discuss various existing machine learning models for detecting spammers in SNSs. Then, we describe our ELM approach for classification.
Proposed framework
The overview of our framework is shown in Fig. 2. Our framework relies on the Bagging ELM for spammer detection on Facebook and Twitter. It is composed of Feature identification, Dataset construction, and Bagging ELM. For each SNS, we identify separate feature sets that are responsible for spamming. Based on the identified feature sets, the two different datasets from Facebook and Twitter were prepared by using the dataset construction component. Each dataset contains a significant number of
Experiments and evaluation
In this section, we explain how we evaluated the performance of our proposed framework in detecting spammers in SNSs. The Bagging ELM method was employed on both Twitter and Facebook datasets as described in Subsection 3.2 to evaluate the performance of our proposed framework. The ELM, Adaboost ELM [39], and Majority Voting ELM (MV-ELM) [40] methods were also employed on the same dataset to validate the performance of Bagging ELM. The implementation and evaluation of all of the methods were run
Conclusion
In this paper, we proposed a Bagging ELM-based spammer detection framework for SNSs. Our proposed framework has three major contributions in this area. First, it identifies account- and object-specific features to facilitate spammer detection in SNSs. Second, it constructs a novel dataset of the two most popular SNSs, i.e., Twitter and Facebook. Finally, it introduces a Bagging ELM classifier and applies this classifier to the dataset that we constructed from Twitter and Facebook. Our
Acknowledgements
This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2014-0-00720-002) supervised by the IITP (Institute for Information & communications Technology Promotion).
Shailendra Rathore He is a PhD student in the Department of Computer Science at Seoul National University of Science and Technology (SeoulTech.), Seoul, South Korea. Currently, he is working in Ubiquitous Computing Security (UCS) Lab under the supervision of Prof. Jong Hyuk Park. His broadly research interest includes Information and Cyber Security, Artificial Intelligence, Social Networking Service Security, IoT. Previous to joining PhD at SeoulTech, he received his M.E. in Information
References (44)
- et al.
Social network security: issues, challenges, threats, and solutions
Inf. Sci.
(2017) - et al.
Efficient spam detection across online social networks
International Conference on Big Data Analysis (ICBDA) IEEE
(2016) - et al.
Detecting spammers on social networks
Neurocomputing
(2015) - et al.
A generic statistical approach for spam detection in Online Social Networks
Comput. Commun.
(2013) - et al.
Detecting Smart Spammers On Social Network: A Topic Model Approach
(2016) - et al.
Twitter spammer detection using data stream clustering
Inf. Sci.
(2014) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
Extreme learning machine based transfer learning for data classification
Neurocomputing
(2016) - et al.
Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations
Future Gener. Comput. Syst.
(2016) - et al.
Multiclass AdaBoost ELM and its application in LBP based face recognition
Math. Prob. Eng.
(2015)
Voting based extreme learning machine
Inf. Sci.
Personalizing information using users' online social networks: a case study of CiteULike
J. Inf. Process. Syst.
Optimization of sentiment analysis using machine learning classifiers
Hum.-centric Comput. Inf. Sci.
Statista Most Famous Social Network Sites Worldwide as of April 2017, Ranked by Number of Active Users (in Millions)
Multilevel learning based modeling for link prediction and users’ consumption preference in Online Social Networks
Future Gener. Comput. Syst.
The Top 20 Valuable Facebook Statistics
Statista Distribution of Global Social Content Sharing Activities as of 2nd Quarter 2016, by Social Network
Nexgate, Research Report 2013 State of Social Media Spam
XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs
J. Inf. Process. Syst.
Detecting spammers on twitter
Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS)
Cats: characterizing automation of twitter spammers
Fifth International Conference on Communication Systems and Networks (COMSNETS) IEEE
SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook
Appl. Soft Comput.
Cited by (11)
BlockIoTIntelligence: A Blockchain-enabled Intelligent IoT Architecture with Artificial Intelligence
2020, Future Generation Computer SystemsCitation Excerpt :A cellular automaton is a collection of colored cells on a grid of specified shape that evolves through several discrete time steps according to a set of rules based on the states of neighboring cells. Access Control: Access control is a security technique in which find, that is the person using the resources such as data, services, computational systems, storage space, and so on in IoT networks and when [64]. However, IoT is a vast network for IoT applications, so access control is a limitation of IoT application.
Empirical study of knowledge withholding in cyberspace: Integrating protection motivation theory and theory of reasoned behavior
2020, Computers in Human BehaviorCitation Excerpt :However, as with all pros, the Internet has its cons. Recently, knowledge infringements on the Internet have continually emerged, leading to increasing concern about privacy and security issues in cyberspace (Chen, Podolski, & Veeraraghavan, 2017; Haggart & Jablonski, 2017; Rathore, Sangaiah, & Park, 2018). As a result, an increasing number of Internet users choose to withhold rather than share their knowledge when surfing the Internet (Fang, 2017; Shen, Li, Sun, Chen, & Wang, 2019).
An efficient hybrid system for anomaly detection in social networks
2021, CybersecurityEnsuring user authentication and data integrity in multi-cloud environment
2020, Human-centric Computing and Information SciencesGenetic algorithm-based cost minimization pricing model for on-demand IaaS cloud service
2020, Journal of Supercomputing
Shailendra Rathore He is a PhD student in the Department of Computer Science at Seoul National University of Science and Technology (SeoulTech.), Seoul, South Korea. Currently, he is working in Ubiquitous Computing Security (UCS) Lab under the supervision of Prof. Jong Hyuk Park. His broadly research interest includes Information and Cyber Security, Artificial Intelligence, Social Networking Service Security, IoT. Previous to joining PhD at SeoulTech, he received his M.E. in Information Security from Thapar University, Patiala, India. He is also reviewer of IEEE System Journal and IEEE Transactions on Industrial Informatics.
Dr.Arun Kumar Sangaiah has received his Master of Engineering (ME) degree in Computer Science and Engineering from the Government College of Engineering, Tirunelveli, Anna University, India. He had received his Doctor of Philosophy (PhD) degree in Computer Science and Engineering from the VIT University, Vellore, India. He is presently working as an Associate Professor in School of Computer Science and Engineering, VIT University, India. His area of interest includes software engineering, computational intelligence, wireless networks, bio-informatics, and embedded systems. He has authored more than 100 publications in different journals and conference of national and international repute. His current research work includes global software development, wireless ad hoc and sensor networks, machine learning, cognitive networks and advances in mobile computing and communications. Also, he was registered a one Indian patent in the area of Computational Intelligence. Besides, Prof. Sangaiah is responsible for Editorial Board Member/Associate Editor of various international journals.
Dr. James J. (Jong Hyuk) Park received Ph.D. degrees in Graduate School of Information Security from Korea University, Korea and Graduate School of Human Sciences from Waseda University, Japan. From December 2002 to July 2007, Dr. Park had been a research scientist of R&D Institute, Hanwha S&C Co., Ltd., Korea. From September 2007 to August 2009, He had been a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor at the Department of Computer Science and Engineering and Department of Interdisciplinary Bio IT Materials, Seoul National University of Science and Technology (SeoulTech), Korea. Dr. Park has published about 200 research papers in international journals and conferences. He has been serving as chair, program committee, or organizing committee chair for many international conferences and workshops. He is a steering chair of international conferences – MUE, FutureTech, CSA, CUTE, UCAWSN, World IT Congress-Jeju. He is editor-in-chief of Human-centric Computing and Information Sciences (HCIS) by Springer, The Journal of Information Processing Systems (JIPS) by KIPS, and Journal of Convergence (JoC) by KIPS CSWRG. He is Associate Editor/Editor of 14 international journals including JoS, JNCA, SCN, CJ, and so on. In addition, he has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John Wiley, Oxford Univ. press, Emerald, Inderscience, MDPI. He got the best paper awards from ISA-08 and ITCS-11 conferences and the outstanding leadership awards from IEEE HPCC-09, ICA3PP-10, IEE ISPA-11, PDCAT-11, IEEE AINA-15. Furthermore, he got the outstanding research awards from the SeoulTech, 2014. His research interests include IoT, Human-centric Ubiquitous Computing, Information Security, Digital Forensics, Vehicular Cloud Computing, Multimedia Computing, etc. He is a member of the IEEE, IEEE Computer Society, KIPS, and KMMS.