A novel framework for internet of knowledge protection in social networking services

https://doi.org/10.1016/j.jocs.2017.12.010Get rights and content

Highlights

  • Spam activities on Social Networking Services (SNSs) are studied.

  • A novel feature set is introduced to the task of spammer detection on SNSs.

  • We propose a new Bagging Extreme Learning Machine approach to detect SNS spammers.

Abstract

With the increasing number of users on Social Networking Service (SNS), the Internet of knowledge shared on it is also increasing. Given such enhancement of Internet of knowledge on SNS, the probability of spreading spammers on it is also increasing day by day. Several traditional machine-learning methods, such as support vector machines and naïve Bayes, have been proposed to detect spammers on SNS. Note, however, that these methods are not efficient due to some issues, such as lower generalization performance and higher training time. An Extreme Learning Machine (ELM) is an efficient classification method that can provide good generalization performance at higher training speed. Nonetheless, it suffers from overfitting and ill-posed problem that can degrade its generalization performance. In this paper, we propose a Bagging ELM-based spammer detection framework that identifies spammers in SNSs with the help of multiple ELMs that we combined using the bagging method. We constructed a labeled dataset of the two most prominent SNSs -- Twitter and Facebook -- to evaluate the performance of our framework. The evaluation results show that our framework obtained higher generalization performance rate of 99.01% for the Twitter dataset and 99.02% for the Facebook datasets, while required a lower training time of 1.17 s and 1.10s, respectively.

Introduction

In the knowledge economy, knowledge is a primary resource. The internet is considered a power of its connectivity. It is used in the internet of things (IoT) to connect passive objects to exhibit their smart features. Similarly, the internet of knowledge is described as the connection of knowledge points originally spread over diverse places to represent them with high value. With the enhancement of the internet of knowledge, the knowledge is progressively found at the network level, such as SNSs. Recently, SNSs have been converted into an essential and prominent medium of communication and sharing knowledge. Typically, SNS users are liable for sharing knowledge in the network and are basic key elements in the network structure. The communities that consist of families, group of friends, and acquaintances are the next basic element in the network structure. In SNSs, users can share knowledge by posting links to their favorite webpages, files, photos, and videos. Furthermore, the structure of SNS communities generates a network of credibility and trust [1,2].

Facebook and Twitter are leading SNSs. According to a report from Statista [3], the total number of Facebook and Twitter users stood at 1968 million and 319 million, respectively, as of April 2017. With the escalating number of users, a huge amount of heterogeneous knowledge is also being produced every day on these two SNSs [4]. Mainly, multimedia knowledge (in the form of text, audio files, videos, and images) is produced, stored, and transferred in a huge amount. The multimedia knowledge posted on these SNSs is mostly accompanied by user likes, comments, tags, hashtags, and so on. Multimedia has become an essential part of SNSs. According to Zephoria Digital Marketing’s report [5], approximately 136,000 photos are shared, 293,000 statuses are updated, and 510,000 comments are posted every 60 s on Facebook. This report also reveals that the average content sharing rate on Facebook was 4.75 billion pieces of content per day as of May 2013, which was 94% more than the content sharing rate as of August 2012. The Statista report [6] states that Facebook and Twitter play a significant role in global content sharing activities, and that 57% and 18% of social content sharing activities occurred via Facebook and Twitter, respectively, as of the second quarter of 2016. The statistics above and our survey on SNS security [7] demonstrate that the amount of multimedia knowledge shared on Facebook and Twitter is increasing every single day. With this increase in multimedia, however, these two SNSs have become the desired target of spammers for spreading spam. A statistic from Nexgate’s report showed an estimated 355% escalation in social media spam during the first half of 2013 [8].

Spammers use numerous techniques for posting spam messages on these SNSs. The posted spam messages act as marketing advertisements and scams and help transmit malware via enclosed malicious URLs, or they are used to perform phishing attacks [9]. In addition to this, spammers can follow unknown users and send unwanted messages containing malicious URLs to obtain high exposure. The masquerade URLs can spread threats in the form of drive-by-download (install malware) and infect the host machine, allowing the installed malware to tap into the host’s confidential information [10]. Furthermore, the infected host machine may also participate in wicked botnet activities, such as operating during Distributed Denial of Service (DDOS) attacks, or become a source of email spam. Spammers may use embedded links to organize a phishing attack, wherein they target a legitimate user to collect his or her confidential information. Usually, spammers on Facebook and Twitter pose as legitimate users. Therefore, identifying and differentiating them from legitimate users make for a difficult task. In the past, spammers on these SNSs were typically simple, with a clear appearance that helped in differentiating them from legitimate users. Nevertheless, spammers can still use cheap automated methods for gaining credibility and trust and making themselves difficult to detect in the large population of SNS users.

Spammer detection in SNSs is a classification problem wherein legitimate users are distinguished from spammers based on their respective features. Note, however, that these classification problems are associated with various significant challenges when attempting to detect spammers. These challenges include the high dimensionality of features, biased and limited training sets, high computational classification complexities, and public unavailability of training sets. To overcome these challenges and carry out spammer classification effectively, several traditional machine-learning methods, such as Support Vector Machine [11,12], Decision Tree [13], Jrip [14], Bayesian Network [15], Random Forest [16], k-Nearest Neighbors [17], etc. have been proposed. Due to the high dimensionality of features, however, finding the optimum parameters for parametric supervised algorithms is very time-consuming and difficult [18]. Therefore, existing models suffer from numerous issues, such as poor generalization performance, longer training time, and higher false positive rates. Moreover, many approaches, such as [19], use biased dataset containing a much smaller number of spam profiles than legitimate ones and provide inaccurate classification results.

Recently, the Extreme Learning Machine (ELM) [20] has been introduced as one of the effective machine learning classifiers. It is considered to be an effective and promising classifier over the other traditional classifiers, such as the Support Vector Machine and Naïve Bayesian, due to the following reasons: 1) it has better computational efficiency; 2) it provides similar or better generalization performance compared with the traditional machine learning algorithms; 3) it does not require adjusting any additional parameters except the predefined network architecture; and 4) it can use all piecewise continuous functions as activation functions, such as radial basis functions, triangular basis functions, sigmoid functions, etc. A significant number of studies from the academe show the competences of ELM in accurate classification at high learning speed [[21], [22], [23]]. Nevertheless, ELM has certain limitations. For instance, the arbitrary selection of input bias and weights in ELM can create the problem of ill-posed wherein the classification returns more than one solution that further results in lower generalization performance [24]. Note, however, that spammer detection in SNSs requires better generalization performance and shorter training time. In particular, shorter training time can be obtained by ELM [18], and generalization performance can be enhanced using ensemble learning [24], wherein a strong learner can be created by combining multiple weak learners [25]. It has been used with many weak learners, such as Decision Tree and Neural Networks, to create a strong learner, and it gives better generalization performance. Thus, similar to other weak learners, an ELM can be used as a weak learner, and its limitations can be overcome by employing the strategy of ensemble learning wherein a strong ELM can be created by combining multiple ELMs.

In this paper, we propose a Bagging ELM-based spammer detection framework wherein multiple ELMs are employed to distinguish spammers from legitimate users on SNSs. Bagging [26] is used as an ensemble learning method to combine multiple ELMs. The main contributions of this paper are as follows:

  • We analyze spam activities on the two most popular SNSs of Facebook and Twitter and propose novel feature sets to facilitate spammer detection for both SNSs. Compared with other existing feature sets [11,14], our proposed feature sets consist of recent and selective features that are responsible for spam on both SNSs.

  • The novelty of this paper primarily lies in the fact that it offers a framework that uses a new Bagging ELM approach to detecting SNS spammers. This method provides higher generalization performance for spammer detection at shorter training time.

  • Since Facebook and Twitter datasets are not publicly available, we constructed a labeled dataset of both SNSs to evaluate the performance of our framework.

  • We also provide a comparative analysis of performance of our framework with other existing frameworks in order to validate the effectiveness of our framework in detecting SNS spammers.

The rest of this paper is organized as follows: Section 2 discusses various existing techniques to mitigate the issue of spam in SNSs and the ELM approach for classification. Section 3 describes our proposed framework and its components, including the features set, dataset construction, and bagging ELM. Section 4 provides an experimental evaluation of our proposed framework and our comparison of it with other existing techniques for detecting SNS spammers. Finally, we conclude our paper in Section 5.

Section snippets

Related work

In this section, we discuss various existing machine learning models for detecting spammers in SNSs. Then, we describe our ELM approach for classification.

Proposed framework

The overview of our framework is shown in Fig. 2. Our framework relies on the Bagging ELM for spammer detection on Facebook and Twitter. It is composed of Feature identification, Dataset construction, and Bagging ELM. For each SNS, we identify separate feature sets that are responsible for spamming. Based on the identified feature sets, the two different datasets from Facebook and Twitter were prepared by using the dataset construction component. Each dataset contains a significant number of

Experiments and evaluation

In this section, we explain how we evaluated the performance of our proposed framework in detecting spammers in SNSs. The Bagging ELM method was employed on both Twitter and Facebook datasets as described in Subsection 3.2 to evaluate the performance of our proposed framework. The ELM, Adaboost ELM [39], and Majority Voting ELM (MV-ELM) [40] methods were also employed on the same dataset to validate the performance of Bagging ELM. The implementation and evaluation of all of the methods were run

Conclusion

In this paper, we proposed a Bagging ELM-based spammer detection framework for SNSs. Our proposed framework has three major contributions in this area. First, it identifies account- and object-specific features to facilitate spammer detection in SNSs. Second, it constructs a novel dataset of the two most popular SNSs, i.e., Twitter and Facebook. Finally, it introduces a Bagging ELM classifier and applies this classifier to the dataset that we constructed from Twitter and Facebook. Our

Acknowledgements

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2014-0-00720-002) supervised by the IITP (Institute for Information & communications Technology Promotion).

Shailendra Rathore He is a PhD student in the Department of Computer Science at Seoul National University of Science and Technology (SeoulTech.), Seoul, South Korea. Currently, he is working in Ubiquitous Computing Security (UCS) Lab under the supervision of Prof. Jong Hyuk Park. His broadly research interest includes Information and Cyber Security, Artificial Intelligence, Social Networking Service Security, IoT. Previous to joining PhD at SeoulTech, he received his M.E. in Information

References (44)

  • J. Cao et al.

    Voting based extreme learning machine

    Inf. Sci.

    (2012)
  • D.H. Lee

    Personalizing information using users' online social networks: a case study of CiteULike

    J. Inf. Process. Syst.

    (2015)
  • J. Singh et al.

    Optimization of sentiment analysis using machine learning classifiers

    Hum.-centric Comput. Inf. Sci.

    (2017)
  • Statista Most Famous Social Network Sites Worldwide as of April 2017, Ranked by Number of Active Users (in Millions)

    (2017)
  • P. Sharma et al.

    Multilevel learning based modeling for link prediction and users’ consumption preference in Online Social Networks

    Future Gener. Comput. Syst.

    (2017)
  • Zephoria Digital Marketing

    The Top 20 Valuable Facebook Statistics

    (2017)
  • Statista Distribution of Global Social Content Sharing Activities as of 2nd Quarter 2016, by Social Network

    (2017)
  • Nexgate, Research Report 2013 State of Social Media Spam

    (2017)
  • S. Rathore et al.

    XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs

    J. Inf. Process. Syst.

    (2017)
  • F. Benevenuto et al.

    Detecting spammers on twitter

    Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS)

    (2010)
  • A.A. Amleshwaram et al.

    Cats: characterizing automation of twitter spammers

    Fifth International Conference on Communication Systems and Networks (COMSNETS) IEEE

    (2013)
  • S. Rathore et al.

    SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook

    Appl. Soft Comput.

    (2017)
  • Cited by (11)

    • BlockIoTIntelligence: A Blockchain-enabled Intelligent IoT Architecture with Artificial Intelligence

      2020, Future Generation Computer Systems
      Citation Excerpt :

      A cellular automaton is a collection of colored cells on a grid of specified shape that evolves through several discrete time steps according to a set of rules based on the states of neighboring cells. Access Control: Access control is a security technique in which find, that is the person using the resources such as data, services, computational systems, storage space, and so on in IoT networks and when [64]. However, IoT is a vast network for IoT applications, so access control is a limitation of IoT application.

    • Empirical study of knowledge withholding in cyberspace: Integrating protection motivation theory and theory of reasoned behavior

      2020, Computers in Human Behavior
      Citation Excerpt :

      However, as with all pros, the Internet has its cons. Recently, knowledge infringements on the Internet have continually emerged, leading to increasing concern about privacy and security issues in cyberspace (Chen, Podolski, & Veeraraghavan, 2017; Haggart & Jablonski, 2017; Rathore, Sangaiah, & Park, 2018). As a result, an increasing number of Internet users choose to withhold rather than share their knowledge when surfing the Internet (Fang, 2017; Shen, Li, Sun, Chen, & Wang, 2019).

    • Ensuring user authentication and data integrity in multi-cloud environment

      2020, Human-centric Computing and Information Sciences
    View all citing articles on Scopus

    Shailendra Rathore He is a PhD student in the Department of Computer Science at Seoul National University of Science and Technology (SeoulTech.), Seoul, South Korea. Currently, he is working in Ubiquitous Computing Security (UCS) Lab under the supervision of Prof. Jong Hyuk Park. His broadly research interest includes Information and Cyber Security, Artificial Intelligence, Social Networking Service Security, IoT. Previous to joining PhD at SeoulTech, he received his M.E. in Information Security from Thapar University, Patiala, India. He is also reviewer of IEEE System Journal and IEEE Transactions on Industrial Informatics.

    Dr.Arun Kumar Sangaiah has received his Master of Engineering (ME) degree in Computer Science and Engineering from the Government College of Engineering, Tirunelveli, Anna University, India. He had received his Doctor of Philosophy (PhD) degree in Computer Science and Engineering from the VIT University, Vellore, India. He is presently working as an Associate Professor in School of Computer Science and Engineering, VIT University, India. His area of interest includes software engineering, computational intelligence, wireless networks, bio-informatics, and embedded systems. He has authored more than 100 publications in different journals and conference of national and international repute. His current research work includes global software development, wireless ad hoc and sensor networks, machine learning, cognitive networks and advances in mobile computing and communications. Also, he was registered a one Indian patent in the area of Computational Intelligence. Besides, Prof. Sangaiah is responsible for Editorial Board Member/Associate Editor of various international journals.

    Dr. James J. (Jong Hyuk) Park received Ph.D. degrees in Graduate School of Information Security from Korea University, Korea and Graduate School of Human Sciences from Waseda University, Japan. From December 2002 to July 2007, Dr. Park had been a research scientist of R&D Institute, Hanwha S&C Co., Ltd., Korea. From September 2007 to August 2009, He had been a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor at the Department of Computer Science and Engineering and Department of Interdisciplinary Bio IT Materials, Seoul National University of Science and Technology (SeoulTech), Korea. Dr. Park has published about 200 research papers in international journals and conferences. He has been serving as chair, program committee, or organizing committee chair for many international conferences and workshops. He is a steering chair of international conferences – MUE, FutureTech, CSA, CUTE, UCAWSN, World IT Congress-Jeju. He is editor-in-chief of Human-centric Computing and Information Sciences (HCIS) by Springer, The Journal of Information Processing Systems (JIPS) by KIPS, and Journal of Convergence (JoC) by KIPS CSWRG. He is Associate Editor/Editor of 14 international journals including JoS, JNCA, SCN, CJ, and so on. In addition, he has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John Wiley, Oxford Univ. press, Emerald, Inderscience, MDPI. He got the best paper awards from ISA-08 and ITCS-11 conferences and the outstanding leadership awards from IEEE HPCC-09, ICA3PP-10, IEE ISPA-11, PDCAT-11, IEEE AINA-15. Furthermore, he got the outstanding research awards from the SeoulTech, 2014. His research interests include IoT, Human-centric Ubiquitous Computing, Information Security, Digital Forensics, Vehicular Cloud Computing, Multimedia Computing, etc. He is a member of the IEEE, IEEE Computer Society, KIPS, and KMMS.

    View full text