Keywords

1 Introduction

Cloud computing based technology such as e-mail services, social network services, storage services, application services, web hosting services, TV services and more; we are probably using it in each day.As of now, knowledge-based user authentication/verification methods have been applied in cloud based technology. Building a more secure solution is needed due to the risk associated with this technique such as shoulder surfing attack, dictionary attack, brute-force attack. Keystroke dynamics with the combination of knowledge based authentication as a security solution could be used in practice. Since keystroke dynamics is a method where people can be identified through their way of typing. It has been established that habitual typing pattern is a behavioral biometric trait in biometric science relates the issues in user identification/authentication. Nevertheless, being nonintrusive and cost effective, keystroke dynamics is a strong alternative to other biometric modalities that can be easily integrated into any existing knowledge-based user authentication system with minor alternation. Obtained accuracies in previous studies are impressive but not acceptable in practice due to the high rate of Failure to Enroll Rate (FER) or intra-class variation. The performances of behavioral biometrics are not impressive in accuracy than morphological biometric modalities. It is very hard to achieve the acceptable accuracy. As per European Standards, access control system mandates the False Acceptance Rate is 1% and Miss Rate is 0.001% [1]. In behavioral biometric characteristics, keystroke dynamics is in trouble due to high rate of intra-class variations (problems in aging, mental state while typing, hand injury, tiredness, …) or data acquisition methods (cross device validation, timing resolution of the system, features selection, keyboard position, hand(s) used, …) which increase the error rate in keystroke dynamics user authentication technique.

Recent keystroke dynamics studies found that personal traits such as age, gender, dominant hand, hand(s) used, emotional state, and typing skill can be explained through the typing pattern on a computer keyboard [6]. Few studies found only age and gender can be explained through the behavior on the touch screen. The science behind this technique is users’ physical structure, hand weight, fingertips size, neurophysiological and neuropsychological factors reflect on the keyboard which discriminates the typing pattern.

These personal traits affect the typing characteristics and consequently affect the classification performance. For instance, touch size area of the child user might less than the touch type of the adult user. Same as the length of the fingers of the female users might higher than fingers of male users. The right-handed user might type digraph consisting keys from the right side of the keyboard more quickly than type digraph consisting keys from the left side. Typing digraph from a key from the left side and another is from the right side of the keyboard then typing pattern might differ among the users used one hand or both hands. If the user is distracted or frustrated by the unnatural behavior of computer then user’s typing pattern change massively. These are the soft biometric features affect the typing characteristics consequently with the classification performance. Type of text, length, clock resolution of the system, the number of running software(s) and type, keyboard type and size with different layout also affect the performance. Therefore, the experimental results are impressive in the lab-based environment to be used in web-based applications but the performance of keystroke dynamics is not 100% accurate in practice. To improve the performance, inclusions of personal traits such as age, gender, handedness, hand(s) used and typing skill are the new direction of keystroke dynamics research. Predicting of such traits is the new research direction not only to increase the keystroke dynamics user recognition performance; it has separate advantages in social network sites to E-Business. The performance in identifying the personal traits is more important. Otherwise, classification performance might be decreased instead.

Some study has been conducted to improve the performance of predicting personal traits to be used this technique in real life applications with the web-based environment. Some study went step to identify the traits of the users based on typing pattern on a computer keyboard but not provided sufficient evidence to be used it on a touchscreen smartphone which is the most common and popular electronic gadgets. Social networking is becoming more popular to keep touch with the large and diverse body of people and groups. Nowadays, users of social network under age group below 18 are rapidly increasing. They easily reach out the contents which are not supposed to access or not suited for them. They share their personal information with strangers. Most of them have included a large number of strangers in their friend list. Social networking administrators delete thousands of profiles for people who do not meet the age group requirement and who behave unnaturally on site. But no potential method has been applied to identify the age group and gender automatically based on the user’s typing pattern on a computer keyboard and touch screen smartphone instead of taking the age and gender information based on trust. Not only the age, it can be used to detect fraudulent claims of handedness and typing skill.

The journey of keystroke dynamics has been started in 1980. Throughout these three decades more than 500 papers have been published in the form of a journal, conference proceeding, and thesis, still, the accuracy of this technique is not reached its goal. More research work has to be done so this technique can achieve its goal and can be used in practice. Ancillary information can significantly improve the recognition performance of biometric systems.

The studies in theliterature are summarized in Table 1 have been conducted in thelab-based environment. They use AZERTY and/or QWERTY keyboard as asensing device. Text patterns of different studies are varied. Some of the texts are short, where some of thetexts are long. Some of thetexts are simply common words, where some of thetexts are logically complex. Some of the studies used 5 fold cross validation test option, where some of the studies used training-testing ratio test option in performance evaluation. The number of examples used in the previous studies is different, some of them maintained the session in data acquisition. It is clear from Table 1 that studies conducted on different datasets to meet the aim of extracting the personal traits based on typing pattern instead of improving the evaluation performed in identifying traits.

Table 1. Success achieved by researchers to recognize the soft biometric traits on keystroke dynamics datasets

Machine learning technique as a classification method is common in all the listed studies in Table 1. Selection of appropriate method is an important issue in keystroke dynamics domain, where the performance of one method in accuracy jumps from 65% to 90%. In our study, we have applied FRNN with VQRS. The performance of our approach is very impressive, consistent, and significantly better than the previously used leading machine learning method SVM. In this study, we compare our approach with SVM with RBF.

The main objective of this study is to develop a model allowing identify the proper gender, age group, handedness, and typing skill of users through the typing pattern on keyboard and touching screen for a predefined text and improve the accuracy by using this soft biometric information as extra features in keystroke dynamics user authentication technique.

Our objective and contribution of this paper are listed below:

  • This study provides an efficient approach to recognizing ancillary information through typing pattern. The performance is comparable with other approaches in the literature.

  • Evaluate the performance of leading machine learning approaches to determine the soft biometric information.

  • Evaluate and compare the performancesof 9 leading anomaly detectors using and without using soft biometric approach.

We have used authentic and shared CMU keystroke dynamics dataset [8] along with dataset collected through anAndroidhandheld device [9]. The details of the datasets are described in Table 1.

2 Static and Shared Keystroke Dynamics Datasets

Many datasets on keystroke dynamics have been created in the last 30 years but some of them are available on the Internet or we can download on request. Details of the publicly available datasets are summarized in Table 2.

Table 2. Details of static and shared datasets on keystroke dynamics through keyboard

Soft biometric information is not included in all the datasets. Datasets created by Killourhy et al. [2], Idrus et al. [7], Yuzun et al. [5] and El-Abed et al. [11] provided soft biometric information with keystroke dynamics datasets which will be the most suitable datasets for our experiment. We have given some names on each dataset depending on considered text type and data acquisition method in this paper so we can easily identify each dataset throughout this paper. Details are in Table 3.

Table 3. Details of static and shared datasets used in this study

M, F, L, R, T, and O represent Male, Female, Left-hander, Right-hander, Touch and another type respectively.

3 Research Methodology

The proposed methodology is described in the following subsections. The first objective is to identify personal traits based on typing pattern on different datasets collected in adifferent environment and improve the keystroke dynamics recognition performance with theinclusion of these personality traits as additional features. In order to solve this problem, we have followed following steps.

We used the following equations to extract the features from the selected dataset. Where some of the features are not presented in the dataset. We recalculated all the 8 features by the following equations:

The timing features of the keystroke dynamics are as follows:

$$ {\text{Key}}\_{\text{Duration }}\left( {\text{KD}} \right) = R_{i} - P_{i} $$
(1)
$$ {\text{UpUp Key Latency }}\left( {\text{UU}} \right) = R_{i + 1} - R_{i} $$
(2)
$$ {\text{DownDown Key Latency }}\left( {\text{DD}} \right) = P_{i + 1} - P_{i} $$
(3)
$$ {\text{UpDown Key Latency }}\left( {\text{UD}} \right) = P_{i + 1} - R_{i} $$
(4)
$$ {\text{DownUp Key Latency }}\left( {\text{DU}} \right) = R_{i + 1} - P_{i} $$
(5)
$$ {\text{TotalTime Key Latency }}\left( {\text{t}} \right) = R_{n} - P_{1} $$
(6)
$$ {\text{Tri}} - {\text{graph Latency }}\left( {\text{T}} \right) = R_{i + 2} - P_{i} $$
(7)
$$ {\text{Four}} - {\text{graph Latency }}\left( {\text{F}} \right) = R_{i + 3} - P_{i} $$
(8)

Here P and R represent the pressed and released time of each i’th key event. We used a different combination of features to find the best choice of feature subset. Generally speaking, we have not applied any filtered or wrapper approach to select the features. We normalized all the datasets within the range [−1, +1] in order to speed up the process. We have used two leading machine learning approaches: SVM and FRNN. Fuzzy-rough nearest neighbor (FRNN) [17] classification algorithm is an alternative to Sarkar’s fuzzy-rough ownership function (FRNN-O) approach [18].

Some anomaly detection algorithms have been applied to keystroke dynamics pattern with the inclusion of personal traits manually. The results show that inclusion of personal traits increases the performance of the keystroke dynamics user recognition system.

4 Experimental Results

Two leading machine learning algorithms have been applied to each dataset and accuracy with 10 fold cross validation has been listed in Tables 4, 5, 6, 7, 8 and 9 to predict the soft biometric information. As per obtained results, FRNN with VQRS is proved to be the suitable learning methods in both desktop and Android environments. Accuracies were recorded by Weka 3.7.4 simulator with default parameter values.

Table 4. Accuracy with standard deviation in identifying gender on different datasets
Table 5. Accuracy with standard deviation in identifying age group (< 30/30 +) on different datasets
Table 6. Accuracy with standard deviation in identifying age group (< 18/18 +) on different datasets
Table 7. Accuracy with standard deviation in identifying handedness on different datasets
Table 8. Accuracy with standard deviation in identifying typing skill (touch/others) on different datasets
Table 9. Accuracy with standard deviation in identifying hand(s) used on different datasets

Tables 10, 11 and 12 represent the performance of 9 anomaly detectors described in [8] after considering soft biometric information. We observed that instead of using the only gender multiple soft biometrics information decreases the EER significantly. Here, we take themedian of samples. In keystroke dynamics, domain median proximity is better than mean.

Table 10. Comparative analysis of anomaly detectors with inclusion of personal traits by using performance metric Equal Error Rate (EER) in % on Dataset A
Table 11. Comparative analysis of anomaly detectorswith inclusion of personal traitsby using performance metric Equal Error Rate (EER) in % on Dataset B
Table 12. Comparative analysis of anomaly detectorswith inclusion of personal traitsby using performance metric Equal Error Rate (EER) in % on Dataset C

5 Conclusion

It is possible to predict the gender, age group, handedness, hand(s) used, and typing skill of the user through the way of typing as it is evident from our experiment with impressive results. It can be used to recognize the gender and age group prediction since keystroke dynamics is a common measurable activity to be used in web-based applications. The activities on the keyboard and touch screen are behavioral biometric characteristics and it could be used to predict the gender and age group to deal with the problem of fake accounts and would enable to create a more loyal and authentic social networking sites. This may facilitate social network sites a fake free, genuine and more loyal user base. Automatically identifying and the inclusion of such traits also can be used as additional soft biometric information to reduce the error rate in the keystroke dynamics user recognition system. This technique also could help E-Commerce site to reach out to the right client. Similarly, this could avoid adverse products more efficiently based on the gender and age group. This technique also can be very useful in a web-based environment for auto profiling of the users. The results also show that age group below 18 can be identify based on typing pattern which can be used to protect the kids from Internet threats.

We have used two leading machine learning methods to predict personal traits on multiple publicly available authentic datasets. Our proposed approaches FRNN-VQRS, a new approach to FRNN achieved impressive results significantly better than previously used SVM with RBF to determine the personality traits in desktop and Android environments. This is a very positive outcome in keystroke dynamics system for a single predefined text which can be used as soft biometric additional features in user identification/authentication technique which decrease the EER 10.69 to 2.53 on CMU keystroke dynamics dataset. This is the modest as well as an efficient approach towards the keystroke dynamics user authentication system which could be used in cloud computing based techniques.