A novel framework for internet of knowledge protection in social networking services

doi:10.1016/j.jocs.2017.12.010

Journal of Computational Science

Volume 26, May 2018, Pages 55-65

https://doi.org/10.1016/j.jocs.2017.12.010 Get rights and content

Highlights

•
Spam activities on Social Networking Services (SNSs) are studied.
•
A novel feature set is introduced to the task of spammer detection on SNSs.
•
We propose a new Bagging Extreme Learning Machine approach to detect SNS spammers.

Abstract

With the increasing number of users on Social Networking Service (SNS), the Internet of knowledge shared on it is also increasing. Given such enhancement of Internet of knowledge on SNS, the probability of spreading spammers on it is also increasing day by day. Several traditional machine-learning methods, such as support vector machines and naïve Bayes, have been proposed to detect spammers on SNS. Note, however, that these methods are not efficient due to some issues, such as lower generalization performance and higher training time. An Extreme Learning Machine (ELM) is an efficient classification method that can provide good generalization performance at higher training speed. Nonetheless, it suffers from overfitting and ill-posed problem that can degrade its generalization performance. In this paper, we propose a Bagging ELM-based spammer detection framework that identifies spammers in SNSs with the help of multiple ELMs that we combined using the bagging method. We constructed a labeled dataset of the two most prominent SNSs -- Twitter and Facebook -- to evaluate the performance of our framework. The evaluation results show that our framework obtained higher generalization performance rate of 99.01% for the Twitter dataset and 99.02% for the Facebook datasets, while required a lower training time of 1.17 s and 1.10s, respectively.

Introduction

In the knowledge economy, knowledge is a primary resource. The internet is considered a power of its connectivity. It is used in the internet of things (IoT) to connect passive objects to exhibit their smart features. Similarly, the internet of knowledge is described as the connection of knowledge points originally spread over diverse places to represent them with high value. With the enhancement of the internet of knowledge, the knowledge is progressively found at the network level, such as SNSs. Recently, SNSs have been converted into an essential and prominent medium of communication and sharing knowledge. Typically, SNS users are liable for sharing knowledge in the network and are basic key elements in the network structure. The communities that consist of families, group of friends, and acquaintances are the next basic element in the network structure. In SNSs, users can share knowledge by posting links to their favorite webpages, files, photos, and videos. Furthermore, the structure of SNS communities generates a network of credibility and trust [1,2].

Facebook and Twitter are leading SNSs. According to a report from Statista [3], the total number of Facebook and Twitter users stood at 1968 million and 319 million, respectively, as of April 2017. With the escalating number of users, a huge amount of heterogeneous knowledge is also being produced every day on these two SNSs [4]. Mainly, multimedia knowledge (in the form of text, audio files, videos, and images) is produced, stored, and transferred in a huge amount. The multimedia knowledge posted on these SNSs is mostly accompanied by user likes, comments, tags, hashtags, and so on. Multimedia has become an essential part of SNSs. According to Zephoria Digital Marketing’s report [5], approximately 136,000 photos are shared, 293,000 statuses are updated, and 510,000 comments are posted every 60 s on Facebook. This report also reveals that the average content sharing rate on Facebook was 4.75 billion pieces of content per day as of May 2013, which was 94% more than the content sharing rate as of August 2012. The Statista report [6] states that Facebook and Twitter play a significant role in global content sharing activities, and that 57% and 18% of social content sharing activities occurred via Facebook and Twitter, respectively, as of the second quarter of 2016. The statistics above and our survey on SNS security [7] demonstrate that the amount of multimedia knowledge shared on Facebook and Twitter is increasing every single day. With this increase in multimedia, however, these two SNSs have become the desired target of spammers for spreading spam. A statistic from Nexgate’s report showed an estimated 355% escalation in social media spam during the first half of 2013 [8].

Spammers use numerous techniques for posting spam messages on these SNSs. The posted spam messages act as marketing advertisements and scams and help transmit malware via enclosed malicious URLs, or they are used to perform phishing attacks [9]. In addition to this, spammers can follow unknown users and send unwanted messages containing malicious URLs to obtain high exposure. The masquerade URLs can spread threats in the form of drive-by-download (install malware) and infect the host machine, allowing the installed malware to tap into the host’s confidential information [10]. Furthermore, the infected host machine may also participate in wicked botnet activities, such as operating during Distributed Denial of Service (DDOS) attacks, or become a source of email spam. Spammers may use embedded links to organize a phishing attack, wherein they target a legitimate user to collect his or her confidential information. Usually, spammers on Facebook and Twitter pose as legitimate users. Therefore, identifying and differentiating them from legitimate users make for a difficult task. In the past, spammers on these SNSs were typically simple, with a clear appearance that helped in differentiating them from legitimate users. Nevertheless, spammers can still use cheap automated methods for gaining credibility and trust and making themselves difficult to detect in the large population of SNS users.

Spammer detection in SNSs is a classification problem wherein legitimate users are distinguished from spammers based on their respective features. Note, however, that these classification problems are associated with various significant challenges when attempting to detect spammers. These challenges include the high dimensionality of features, biased and limited training sets, high computational classification complexities, and public unavailability of training sets. To overcome these challenges and carry out spammer classification effectively, several traditional machine-learning methods, such as Support Vector Machine [11,12], Decision Tree [13], Jrip [14], Bayesian Network [15], Random Forest [16], k-Nearest Neighbors [17], etc. have been proposed. Due to the high dimensionality of features, however, finding the optimum parameters for parametric supervised algorithms is very time-consuming and difficult [18]. Therefore, existing models suffer from numerous issues, such as poor generalization performance, longer training time, and higher false positive rates. Moreover, many approaches, such as [19], use biased dataset containing a much smaller number of spam profiles than legitimate ones and provide inaccurate classification results.

Recently, the Extreme Learning Machine (ELM) [20] has been introduced as one of the effective machine learning classifiers. It is considered to be an effective and promising classifier over the other traditional classifiers, such as the Support Vector Machine and Naïve Bayesian, due to the following reasons: 1) it has better computational efficiency; 2) it provides similar or better generalization performance compared with the traditional machine learning algorithms; 3) it does not require adjusting any additional parameters except the predefined network architecture; and 4) it can use all piecewise continuous functions as activation functions, such as radial basis functions, triangular basis functions, sigmoid functions, etc. A significant number of studies from the academe show the competences of ELM in accurate classification at high learning speed [[21], [22], [23]]. Nevertheless, ELM has certain limitations. For instance, the arbitrary selection of input bias and weights in ELM can create the problem of ill-posed wherein the classification returns more than one solution that further results in lower generalization performance [24]. Note, however, that spammer detection in SNSs requires better generalization performance and shorter training time. In particular, shorter training time can be obtained by ELM [18], and generalization performance can be enhanced using ensemble learning [24], wherein a strong learner can be created by combining multiple weak learners [25]. It has been used with many weak learners, such as Decision Tree and Neural Networks, to create a strong learner, and it gives better generalization performance. Thus, similar to other weak learners, an ELM can be used as a weak learner, and its limitations can be overcome by employing the strategy of ensemble learning wherein a strong ELM can be created by combining multiple ELMs.

In this paper, we propose a Bagging ELM-based spammer detection framework wherein multiple ELMs are employed to distinguish spammers from legitimate users on SNSs. Bagging [26] is used as an ensemble learning method to combine multiple ELMs. The main contributions of this paper are as follows:

•
We analyze spam activities on the two most popular SNSs of Facebook and Twitter and propose novel feature sets to facilitate spammer detection for both SNSs. Compared with other existing feature sets [11,14], our proposed feature sets consist of recent and selective features that are responsible for spam on both SNSs.
•
The novelty of this paper primarily lies in the fact that it offers a framework that uses a new Bagging ELM approach to detecting SNS spammers. This method provides higher generalization performance for spammer detection at shorter training time.
•
Since Facebook and Twitter datasets are not publicly available, we constructed a labeled dataset of both SNSs to evaluate the performance of our framework.
•
We also provide a comparative analysis of performance of our framework with other existing frameworks in order to validate the effectiveness of our framework in detecting SNS spammers.

The rest of this paper is organized as follows: Section 2 discusses various existing techniques to mitigate the issue of spam in SNSs and the ELM approach for classification. Section 3 describes our proposed framework and its components, including the features set, dataset construction, and bagging ELM. Section 4 provides an experimental evaluation of our proposed framework and our comparison of it with other existing techniques for detecting SNS spammers. Finally, we conclude our paper in Section 5.

Section snippets

Related work

In this section, we discuss various existing machine learning models for detecting spammers in SNSs. Then, we describe our ELM approach for classification.

Proposed framework

The overview of our framework is shown in Fig. 2. Our framework relies on the Bagging ELM for spammer detection on Facebook and Twitter. It is composed of Feature identification, Dataset construction, and Bagging ELM. For each SNS, we identify separate feature sets that are responsible for spamming. Based on the identified feature sets, the two different datasets from Facebook and Twitter were prepared by using the dataset construction component. Each dataset contains a significant number of

Experiments and evaluation

In this section, we explain how we evaluated the performance of our proposed framework in detecting spammers in SNSs. The Bagging ELM method was employed on both Twitter and Facebook datasets as described in Subsection 3.2 to evaluate the performance of our proposed framework. The ELM, Adaboost ELM [39], and Majority Voting ELM (MV-ELM) [40] methods were also employed on the same dataset to validate the performance of Bagging ELM. The implementation and evaluation of all of the methods were run

Conclusion

In this paper, we proposed a Bagging ELM-based spammer detection framework for SNSs. Our proposed framework has three major contributions in this area. First, it identifies account- and object-specific features to facilitate spammer detection in SNSs. Second, it constructs a novel dataset of the two most popular SNSs, i.e., Twitter and Facebook. Finally, it introduces a Bagging ELM classifier and applies this classifier to the dataset that we constructed from Twitter and Facebook. Our

Acknowledgements

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2014-0-00720-002) supervised by the IITP (Institute for Information & communications Technology Promotion).

Shailendra Rathore He is a PhD student in the Department of Computer Science at Seoul National University of Science and Technology (SeoulTech.), Seoul, South Korea. Currently, he is working in Ubiquitous Computing Security (UCS) Lab under the supervision of Prof. Jong Hyuk Park. His broadly research interest includes Information and Cyber Security, Artificial Intelligence, Social Networking Service Security, IoT. Previous to joining PhD at SeoulTech, he received his M.E. in Information

References (44)

S. Rathore et al.
Social network security: issues, challenges, threats, and solutions
Inf. Sci.
(2017)
H. Xu et al.
Efficient spam detection across online social networks
International Conference on Big Data Analysis (ICBDA) IEEE
(2016)
X. Zheng et al.
Detecting spammers on social networks
Neurocomputing
(2015)
F. Ahmed et al.
A generic statistical approach for spam detection in Online Social Networks
Comput. Commun.
(2013)
L. Liu et al.
Detecting Smart Spammers On Social Network: A Topic Model Approach
(2016)
Z. Miller et al.
Twitter spammer detection using data stream clustering
Inf. Sci.
(2014)
G.B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
X. Li et al.
Extreme learning machine based transfer learning for data classification
Neurocomputing
(2016)
W. Budgaga et al.
Predictive analytics using statistical, learning, and ensemble methods to support real-time exploration of discrete event simulations
Future Gener. Comput. Syst.
(2016)
Y. Jiang et al.
Multiclass AdaBoost ELM and its application in LBP based face recognition
Math. Prob. Eng.
(2015)

J. Cao et al.

Voting based extreme learning machine

Inf. Sci.

(2012)

D.H. Lee

Personalizing information using users' online social networks: a case study of CiteULike

J. Inf. Process. Syst.

(2015)

J. Singh et al.

Optimization of sentiment analysis using machine learning classifiers

Hum.-centric Comput. Inf. Sci.

(2017)

Statista Most Famous Social Network Sites Worldwide as of April 2017, Ranked by Number of Active Users (in Millions)

(2017)

P. Sharma et al.

Multilevel learning based modeling for link prediction and users’ consumption preference in Online Social Networks

Future Gener. Comput. Syst.

(2017)

Zephoria Digital Marketing

The Top 20 Valuable Facebook Statistics

(2017)

Statista Distribution of Global Social Content Sharing Activities as of 2nd Quarter 2016, by Social Network

(2017)

Nexgate, Research Report 2013 State of Social Media Spam

(2017)

S. Rathore et al.

XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs

J. Inf. Process. Syst.

(2017)

F. Benevenuto et al.

Detecting spammers on twitter

Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS)

(2010)

A.A. Amleshwaram et al.

Cats: characterizing automation of twitter spammers

Fifth International Conference on Communication Systems and Networks (COMSNETS) IEEE

(2013)

S. Rathore et al.

SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook

Appl. Soft Comput.

(2017)

Cited by (11)

BlockIoTIntelligence: A Blockchain-enabled Intelligent IoT Architecture with Artificial Intelligence
2020, Future Generation Computer Systems
Citation Excerpt :
A cellular automaton is a collection of colored cells on a grid of specified shape that evolves through several discrete time steps according to a set of rules based on the states of neighboring cells. Access Control: Access control is a security technique in which find, that is the person using the resources such as data, services, computational systems, storage space, and so on in IoT networks and when [64]. However, IoT is a vast network for IoT applications, so access control is a limitation of IoT application.
In the recent year, Internet of Things (IoT) is industrializing in several real-world applications such as smart transportation, smart city to make human life reliable. With the increasing industrialization in IoT, an excessive amount of sensing data is producing from various sensors devices in the Industrial IoT. To analyzes of big data, Artificial Intelligence (AI) plays a significant role as a strong analytic tool and delivers a scalable and accurate analysis of data in real-time. However, the design and development of a useful big data analysis tool using AI have some challenges, such as centralized architecture, security, and privacy, resource constraints, lack of enough training data. Conversely, as an emerging technology, Blockchain supports a decentralized architecture. It provides a secure sharing of data and resources to the various nodes of the IoT network is encouraged to remove centralized control and can overcome the existing challenges in AI. The main goal of our research is to design and develop an IoT architecture with blockchain and AI to support an effective big data analysis. In this paper, we propose a Blockchain-enabled Intelligent IoT Architecture with Artificial Intelligence that provides an efficient way of converging blockchain and AI for IoT with current state-of-the-art techniques and applications. We evaluate the proposed architecture and categorized into two parts: qualitative analysis and quantitative analysis. In qualitative evaluation, we describe how to use AI and Blockchain in IoT applications with “AI-driven Blockchain” and “Blockchain-driven AI.” In quantitative analysis, we present a performance evaluation of the BlockIoTIntelligence architecture to compare existing researches on device, fog, edge and cloud intelligence according to some parameters such as accuracy, latency, security and privacy, computational complexity and energy cost in IoT applications. The evaluation results show that the proposed architecture performance over the existing IoT architectures and mitigate the current challenges.
Empirical study of knowledge withholding in cyberspace: Integrating protection motivation theory and theory of reasoned behavior
2020, Computers in Human Behavior
Citation Excerpt :
However, as with all pros, the Internet has its cons. Recently, knowledge infringements on the Internet have continually emerged, leading to increasing concern about privacy and security issues in cyberspace (Chen, Podolski, & Veeraraghavan, 2017; Haggart & Jablonski, 2017; Rathore, Sangaiah, & Park, 2018). As a result, an increasing number of Internet users choose to withhold rather than share their knowledge when surfing the Internet (Fang, 2017; Shen, Li, Sun, Chen, & Wang, 2019).
This study integrates protection motivation theory and theory of reasoned action to investigate knowledge withholding in cyberspace, which is a highly prevalent counterproductive knowledge behavior but has received limited attention. The research model was tested with 386 valid online survey responses among Chinese Internet users. The results indicate that both threat appraisal (perceived severity, perceived susceptibility) and coping appraisal (response efficacy, self-efficacy) are positively associated with the attitude toward knowledge withholding, and that the attitude toward knowledge withholding and subjective norms about knowledge withholding are positively related to knowledge withholding intentions. In addition, the results also show that attitude toward knowledge withholding significantly mediates the relationships of threat appraisal and coping appraisal with knowledge withholding intentions. We believe that the findings of this study not only provide a new theoretical perspective on understanding knowledge withholding behavior but also offer valuable insights for reducing knowledge withholding behavior in cyberspace. Limitations and future research directions are also discussed.
An efficient hybrid system for anomaly detection in social networks
2021, Cybersecurity
Securing data in transit using data-in-transit defender architecture for cloud communication
2021, Soft Computing
Ensuring user authentication and data integrity in multi-cloud environment
2020, Human-centric Computing and Information Sciences
Genetic algorithm-based cost minimization pricing model for on-demand IaaS cloud service
2020, Journal of Supercomputing

View all citing articles on Scopus

Dr.Arun Kumar Sangaiah has received his Master of Engineering (ME) degree in Computer Science and Engineering from the Government College of Engineering, Tirunelveli, Anna University, India. He had received his Doctor of Philosophy (PhD) degree in Computer Science and Engineering from the VIT University, Vellore, India. He is presently working as an Associate Professor in School of Computer Science and Engineering, VIT University, India. His area of interest includes software engineering, computational intelligence, wireless networks, bio-informatics, and embedded systems. He has authored more than 100 publications in different journals and conference of national and international repute. His current research work includes global software development, wireless ad hoc and sensor networks, machine learning, cognitive networks and advances in mobile computing and communications. Also, he was registered a one Indian patent in the area of Computational Intelligence. Besides, Prof. Sangaiah is responsible for Editorial Board Member/Associate Editor of various international journals.

Dr. James J. (Jong Hyuk) Park received Ph.D. degrees in Graduate School of Information Security from Korea University, Korea and Graduate School of Human Sciences from Waseda University, Japan. From December 2002 to July 2007, Dr. Park had been a research scientist of R&D Institute, Hanwha S&C Co., Ltd., Korea. From September 2007 to August 2009, He had been a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor at the Department of Computer Science and Engineering and Department of Interdisciplinary Bio IT Materials, Seoul National University of Science and Technology (SeoulTech), Korea. Dr. Park has published about 200 research papers in international journals and conferences. He has been serving as chair, program committee, or organizing committee chair for many international conferences and workshops. He is a steering chair of international conferences – MUE, FutureTech, CSA, CUTE, UCAWSN, World IT Congress-Jeju. He is editor-in-chief of Human-centric Computing and Information Sciences (HCIS) by Springer, The Journal of Information Processing Systems (JIPS) by KIPS, and Journal of Convergence (JoC) by KIPS CSWRG. He is Associate Editor/Editor of 14 international journals including JoS, JNCA, SCN, CJ, and so on. In addition, he has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John Wiley, Oxford Univ. press, Emerald, Inderscience, MDPI. He got the best paper awards from ISA-08 and ITCS-11 conferences and the outstanding leadership awards from IEEE HPCC-09, ICA3PP-10, IEE ISPA-11, PDCAT-11, IEEE AINA-15. Furthermore, he got the outstanding research awards from the SeoulTech, 2014. His research interests include IoT, Human-centric Ubiquitous Computing, Information Security, Digital Forensics, Vehicular Cloud Computing, Multimedia Computing, etc. He is a member of the IEEE, IEEE Computer Society, KIPS, and KMMS.

View full text

A novel framework for internet of knowledge protection in social networking services

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed framework

Experiments and evaluation

Conclusion

Acknowledgements

Inf. Sci.

International Conference on Big Data Analysis (ICBDA) IEEE

Neurocomputing

Comput. Commun.

Inf. Sci.

Neurocomputing

Neurocomputing

Future Gener. Comput. Syst.

Math. Prob. Eng.

Inf. Sci.

Personalizing information using users' online social networks: a case study of CiteULike

J. Inf. Process. Syst.

Optimization of sentiment analysis using machine learning classifiers

Hum.-centric Comput. Inf. Sci.

Statista Most Famous Social Network Sites Worldwide as of April 2017, Ranked by Number of Active Users (in Millions)

Multilevel learning based modeling for link prediction and users’ consumption preference in Online Social Networks

Future Gener. Comput. Syst.

The Top 20 Valuable Facebook Statistics

Statista Distribution of Global Social Content Sharing Activities as of 2nd Quarter 2016, by Social Network

Nexgate, Research Report 2013 State of Social Media Spam

XSSClassifier: an efficient XSS attack detection approach based on machine learning classifier on SNSs

J. Inf. Process. Syst.

Detecting spammers on twitter

Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS)

Cats: characterizing automation of twitter spammers

Fifth International Conference on Communication Systems and Networks (COMSNETS) IEEE

SpamSpotter: an efficient spammer detection framework based on intelligent decision support system on facebook

Appl. Soft Comput.