GARS: Real-time system for identification, assessment and control of cyber grooming attacks

doi:10.1016/j.cose.2013.12.004

Computers & Security

Volume 42, May 2014, Pages 177-190

https://doi.org/10.1016/j.cose.2013.12.004 Get rights and content

Abstract

In this paper, the Grooming Attack Recognition System (GARS) is presented. The main objectives of GARS are the real-time identification, assessment and control of cyber grooming attacks in favor of child protection. The system utilizes the processes of document classification, personality recognition, user history and exposure time recording to calculate specific risks children are exposed to during chat conversations. The above processes are repeated after each new message and three of them feed corresponding fuzzy logic controllers that provide particular but homogenized risk values as outputs. The weighted sum of the particular risk values results in a total value that indicates the current cyber grooming risk the child is exposed to, as the conversation evolves. Depending on predefined thresholds, the total risk value can be used to trigger alarms for various scopes (children, parents, etc). The practical use of GARS is demonstrated with a case study based on real grooming dialogs. Furthermore, an evaluation of the proposed approach through the discussion of applicability and performance results is discussed.

Introduction

The growing problem of cyber predators is a major concern for parents in their effort to protect children against malicious acts on the Internet. Moreover, the rapid development of Internet communications has brought new means for predators to approach potential victims, such as social networks, instant messaging applications and other sources (Armagh and Battaglia, 2006). The results of a successful attack on children can be numerous and catastrophic, and include psychological and physical effects such as abnormal psychology, anxiety, depression, aversion from school and social activities, behavioral and learning difficulties, tendency towards drugs and alcohol abuse and deliberate self-harm incidents (Noll et al., 2003).

In fact, there is a variety of hazards that children are exposed to. First of all, sexual exploitation attacks, known as child grooming, are primarily hazardous due to their catastrophic consequences. The term ‘child grooming’ refers to actions performed by predators with an aim to establish a sexual or emotional relationship with children (Olson et al., 2007). Moreover, cyberbullying and cyberstalking attacks refer to threatening acts through the use of online communications. On one hand, cyberbullying can be defined as an aggressive action, performed via electronic media, such as mobile phone text messages and Internet (Bauman, 2007). Recent trends involve the use of social networks, like Facebook or MySpace, where predators take advantage of the anonymity and increasing penetration to trap and attack the victim. On the other hand, cyberstalking involves activities such as harassing e-mail, flaming (online verbal abuse), mass unsolicited e-mail and forging profiles on social networks. All activities tend to threaten victims and affect personalities (Schmalleger and Pittaro, 2009).

In terms of parental control, the ambivalent knowledge about computing and related activities, as well as physical time constraints, limit both the observation of Internet use and therefore child protection. This limitation created the need for parental control software tools which however, have their own restrictions. First, security filters can be bypassed easily by sophisticated users and a simple matching of keywords in a html text or improper images exchange is not enough to conclude whether the communication link has malicious intentions in order to restrict Internet access (Think Digit, 2009). Second, most tools lack the intelligence to understand how risk is progressing throughout conversation. This further limits their applicability with false warnings (Winder, 2013). Besides, each grooming incident is unique in nature due to time-related and physical factors. This relation becomes more complex when different personalities interact online using different styles of language. For example, in typing common expressions Internet users may use idioms and abbreviations instead of formal language. Therefore, dialog analysis should overcome these challenges and recognize potential grooming threats even if the captured text is not in a formal language format.

Parental software tools demand system resources in order to process and save data which in turn diminish the performance of computerized systems (Think Digit, 2009). Moreover, the lack of customization according to a user profile limits the applicability of such tools. This is because default settings usually prohibit access to web or conversation content recognized malicious at first sight. Although the use of these settings may result in preventing a grooming attack from realization, it rarely explains why a distinct site or conversation has been banned. This implies that neither the parent nor the child is informed about their Internet habits (Amanda et al., 2012).

Olson et al. have analyzed the communicative process predators follow to entrap their victims into a sexual relationship (Olson et al., 2007). This process refers to the time and effort a predator uses to deceive the victim into accepting sexual proposals known as the ‘cycle of entrapment’. Similar research on the nature of grooming attacks reveals that predators follow specific strategies to catch and maintain the attention of victims. For example, a predator may build a high social profile to get acquainted with a victim and turn into a friendly profile during the conversation. Interestingly, it is quite common that after the attack, predators delete cookies so as to erase their history and thus reduce the risk of being caught by the authorities (O’Connell, 2003).

Despite the development of parental control software, grooming attacks are still on the rise. Therefore, there is a need, from a technological perspective, to develop effective defenses against grooming attacks. The respective effective defenses should be transparent without restricting access to any content, in order not to be disturbing for the minor user. Besides, the defenses should not store any communication data in order not to violate any legislation. Moreover, the parent should be warned in real time about the potential grooming threat. It is worth noting that controlling grooming attacks prior to their realization is of fundamental importance. The early recognition is vital for the parent in order to follow all necessary actions to eliminate the threat before the grooming attacks becomes irreversible.

This paper presents the Grooming Attack Recognition System (GARS) as a tool to identify, assess and control in real-time, grooming attacks in favor of child protection. GARS operation is dedicated to the calculation of the running total risk value which reflects the grooming threat that the child is exposed to as the conversation evolves. This total risk value is the synthesis of four (4) distinct risk values combined with a weight balancing method. Three processes provide inputs in corresponding fuzzy logic controllers which calculate respective risk values as outputs. At the same time, the exposure time process calculates the corresponding risk value as a function of the time the child is exposed to possible threats. Whenever the total risk value overlaps predefined thresholds the alarm mechanism is triggered. Simultaneously, the alarm mechanism sends an instant warning signal to the parent and in addition, the child is also warned about the criticality of risk the conversation possesses with a colorful signal.

Related work as an intelligent system for grooming attack identification is the ChatCoder 2.0 (Kontostathis et al., 2009). This system identifies and analyzes predatory posts using machine learning algorithms and a set of 15 attributes related to grooming attacks. However, limitations such as the amount of false positive/negative alarms require further tuning of the system (Kontostathis et al., 2012). In parallel, the recognition of abnormal behavior and the effective risk assessment during an online communication has been considered as a critical process in most information systems (Polemi et al., 2013). In terms of grooming recognition, most major challenges refer to the security issues associated with predator's gain of trust. Related research has revealed serious security threats that arise from the insider, in cases where the predator gains the victim's trust (Kandias et al., 2010). Similarly, recently published research has acknowledged that forensic data from smartphones can be used towards minor protection (Mylonas et al., 2013).

The paper is organized as follows: Section 2 explains the proposed framework. Section 3 describes the reasoning underlying fuzzy logic and weight balancing. Section 4 demonstrates the applicability of GARS with an implementation example and further discusses the results of performance tests. Section 5 concludes the paper and outlines ideas for future work.

Section snippets

The proposed framework

The GARS system operates as a linear one (Fig. 1) having as input four (4) particular processes, namely: Document Classification (DC), Personality Recognition (PR), User History (UH) and Exposure Time (ET). The output of each process, except ET, becomes input in a corresponding fuzzy logic controller (FL_dc, FL_pr, FL_uh) which calculates the particular risk value. The risk value from ET is calculated using statistical interval prediction (InP) (Papoulis and Pillai, 2002). The total risk value is

Fuzzy logic

GARS’ functionality is based on the synthesis of the particular risk values, calculated by discrete processes. In fact, the outputs of the processes have specific characteristics: a) heterogeneity, as they come from different processes and data sources, b) uncertainty, based on the functionality of each process. Actually, the basic challenge for GARS is the transformation of processes' outputs into particular risk values. In reaching this objective, we use fuzzy logic to express the behavior of

Implementation aspects

GARS can be implemented with two topologies: i) the local, where all functionalities are concentrated in the local system and ii) the remote, where dispersed clients send dialog parts for analysis to the server. The main advantage of the remote topology is that there is no need to install analysis software on clients. This is of especially importance in mobile devices with limited system resources. Besides, in the remote topology updates related to grooming recognition patterns can be

Experimental results

For a better understanding of GARS effectiveness in recognizing grooming attacks, we implemented a test based on the remote topology. The implementation has been tested with clients installed on several PCs, in order to examine the system's accuracy in terms of grooming recognition, along with to run performance tests for the evaluation of system resources consumption. All supported functionalities on one host, called GARS server, were based on Ubuntu 12.04.1 LTS with a static IP. In

Conclusions

The proposed GARS system has been developed with the aim to limit grooming attacks. The added value of GARS over related work can be focused on the fact that a number of combined methods are used to result in a single risk value. Specifically, a number of processes and sensors are utilized for analyzing dialogs and assessing the progress of a grooming attack in a conversation. In parallel, GARS does not store any dialog parts, but rather analyzes them in real time respecting the communication

Michalopoulos Dimitrios is a PhD candidate at the Department of Applied Informatics of University of Macedonia, Greece. He holds a Diploma degree in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, Greece and an M.Sc. degree in Information Systems Security from the University of Plymouth, United Kingdom. His research interests include information security, machine learning and computer networks.

References (37)

A. Mylonas et al.
Smartphone sensor data as digital evidence
Comput & Secu (Special Issue: Cybercrime in the Digital Economy)
(oct 2013)
R. Riggio et al.
Personality and deceptionability
Personality Individ Differ
(1988)
L. Amanda et al.
Social media and young adults
(2012)
S. Armagh et al.
Use of computers in the sexual exploitation of children. Portable guides to investigating child abuse
(2006)
S. Bauman
CyberBullying: a virtual menace
(2007)
S. Dennison et al.
The big 5 dimensional personality approach to understanding sex offenders
Psychol Crime Law
(2001)
M. Er et al.
Automatic generation of fuzzy inference systems via unsupervised learning
Neural Netw
(2008)
M. François et al.
Using linguistic cues for the automatic recognition of personality in conversation and text
J Artif Intell Res (JAIR)
(2007)
I. Hirschman et al.
Extreme Eigen values of Toeplitz operators
(1977)
W. Ho Chung et al.
Interpreting TF-IDF term weights as making relevance decisions
ACM Trans Inf Syst
(2008)

M. Kandias et al.

Which side are you on? A new Panopticon vs. privacy

M. Kandias et al.

An insider threat prediction model

Y. Kitamura

Empirical likelihood methods in econometrics: theory and practice

(2006)

A. Kontostathis et al.

Text mining and cybercrime

A. Kontostathis et al.

Identifying predators using ChatCoder 2.0-Notebook for PAN at CLEF 2012

C. Lee

Fuzzy logic in control systems: fuzzy logic controllers—parts I and II

IEEE Trans Syst Man Cybern

(1990)

A. McCallum

Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering

(1996)

D. Michalopoulos et al.

A method to calculate social networking hazard probability in definite time

Inf Manag Comput Secur

(2013)

Cited by (12)

Session-based cyberbullying detection in social media: A survey
2023, Online Social Networks and Media
Cyberbullying is a pervasive problem in online social media, where a bully abuses a victim through a social media session. By investigating cyberbullying perpetrated through social media sessions, recent research has looked into mining patterns and features for modelling and understanding the two defining characteristics of cyberbullying: repetitive behaviour and power imbalance. In this survey paper, we define a framework that encapsulates four different steps session-based cyberbullying detection should go through, and discuss the multiple challenges that differ from single text-based cyberbullying detection. Based on this framework, we provide a comprehensive overview of session-based cyberbullying detection in social media, delving into existing efforts from a data and methodological perspective. Our review leads us to proposing evidence-based criteria for a set of best practices to create session-based cyberbullying datasets. In addition, we perform benchmark experiments comparing the performance of state-of-the-art session-based cyberbullying detection models as well as large pre-trained language models across two different datasets. Through our review, we also put forth a set of open challenges as future research directions.
Text Mining-based Social-Psychological Vulnerability Analysis of Potential Victims To Cybergrooming: Insights and Lessons Learned
2023, ACM Web Conference 2023 - Companion of the World Wide Web Conference, WWW 2023
TipScreener: A Framework for Mining Tips for Online Review Readers
2022, Journal of Theoretical and Applied Electronic Commerce Research
Session-Based Cyberbullying Detection in Social Media: A Survey
2022, SSRN
Session-based Cyberbullying Detection in Social Media: A Survey
2022, arXiv
A Self-Tuning Cyber-Attacks' Location Identification Approach for Critical Infrastructures
2022, IEEE Transactions on Industrial Informatics

View all citing articles on Scopus

Ioannis Mavridis is an Associate Professor of Information Systems Security at the Department of Applied Informatics of University of Macedonia, Greece. He holds a Diploma in Computer Engineering and Informatics from the University of Patras, Greece and a Doctor's degree (Ph.D.) in Mobile Computing Security from the Aristotle University of Thessaloniki, Greece. He has published over 100 articles in international and national scientific journals and conferences, mainly on information security related topics. His research interests include the areas of information security, cyber-crime, access control, intrusion detection, security management.

Marija Jankovic is a Doctoral Researcher at the Department of Information Systems of University of Belgrade, Serbia. She received her B.Sc. and M.Sc. degrees in Information Systems from the Faculty of Organizational Sciences, University of Belgrade, Serbia. She was guest researcher at the National Institute of Standards and Technology (NIST), USA in 2009–2010. Her research interests include business process modeling, interoperability and information security.

View full text

GARS: Real-time system for identification, assessment and control of cyber grooming attacks

Abstract

Introduction

Section snippets

The proposed framework

Fuzzy logic

Implementation aspects

Experimental results

Conclusions

Comput & Secu (Special Issue: Cybercrime in the Digital Economy)

Personality Individ Differ

Social media and young adults

Use of computers in the sexual exploitation of children. Portable guides to investigating child abuse

CyberBullying: a virtual menace

The big 5 dimensional personality approach to understanding sex offenders

Psychol Crime Law

Automatic generation of fuzzy inference systems via unsupervised learning

Neural Netw

Using linguistic cues for the automatic recognition of personality in conversation and text

J Artif Intell Res (JAIR)

Extreme Eigen values of Toeplitz operators

Interpreting TF-IDF term weights as making relevance decisions

ACM Trans Inf Syst