Abstract
Keystrokes dynamics has been used for quite sometimes in authentication of users. The technique has immense possibilities due to ease of implementation and un-obtrusive nature. Researchers have been working for attaining improved accuracy rate of user identification. Such techniques are validated using standard data-set. As it turns out, the quality of input data is very much important for generating an accurate use pattern vector. In this paper, an application for data collection has been presented. The application, besides creating a user data-set, also generates a signature vector database.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
In this era enormous use of automated system together with the cloud based means gives a broader perspective to end user for storing as well as accessing data in an efficient manner. However it throws a big challenge to security and authentication domain. Prior to access the secured data, it is essential to verify the authenticity of the user. Determining the relevancy of the user with respect to the data is foremost agenda of authentication. Most of the advanced systems in different application working with distributed workstations (servers) deployed over different geographic region. The security of user and his/her data becomes more vulnerable in the wireless medium as there is no dedicated link or method specified over there. We need a foolproof measure against unauthorized access to computer resources and data. The traditional authentication techniques were mostly depended on password based methods. The traditional techniques fail to provide enough protection to the user data. This has prompted the researchers to identify a new area of authentication known as Biometrics, which include finger prints, palm veins; face recognition, DNA, palm print, hand geometry, iris recognition, pattern of human behavior, like- key typing rhythm, ETC. Keystroke dynamics [1, 6] or typing dynamics is a behavioral biometric, refers to the automated method of identifying or confirming the identity of an individual based on the manner and the rhythm of typing on a keyboard. The keystroke techniques are of two type - Keystroke Static authentication (KSA) and Keystroke dynamic authentication (KDA). In Static keystroke based technique, user authentication is done at a particular time instance. The continuous/ dynamic keystroke method is more effective than KSA and it requires the verification process to be continued during the entire session of user interaction. The raw measurements used for keystroke dynamics are dwell time and flight time.
The rest of the paper is organized as follows. In Sect. 2, we review various approaches in keystroke biometrics briefly and analyze their error rates. In Sect. 3, our proposed approach is described. We give a full detail of the implementations of the approaches and provide experimental results in Sect. 4. Finally, Sect. 5 concludes the paper with suggestions for future work (Fig. 1).
2 Literature Review
Most of the existing approaches focus on static verification, where a user types specific pre-enrolled string, e.g., a password during a login process, and then their keystroke features are analyzed for authentication purposes [1]. Pin et al. [2] proposed a solution with EER of 1.401 % for strengthening existing password based authentication system by using two layer fusion approach. Using classification techniques based on template matching and Bayesian likelihood models Fabian Monrose [3] achieved accuracy level of 83.22–92.14 %. Yu et al. [4] recommended nearest neighbor classifier with the new distance metric in order to identify a legitimate user with respect to a threshold value; this system achieved EER of 8.7 %. Kenneth Revett et al. [5] achieved 95 % of accuracy in user authentication by inventing software based module where combination of the typing speed and the first and last few characters of the login ID is enough to identify an authenticate user. Wang et al. [7] introduced a new user authentication approach by using keystroke dynamic method. This method includes training and authentication. It showed better performance in term of FAR and FRR. Babaeizadeh [8] suggested a KDA based system for verifying a user while requesting for services via CSP in Mobile cloud computing (MCC) environment. The proposed ECC cryptographic algorithm along with keystroke duration attribute was proved to defend 97.33 % of efforts for an imposter attack. The data quality, uniqueness and consistency of typing pattern can be improved by using artificial rhythms and tempo cues [9].
3 The Proposed System
Biometric authentication systems usually have two phases for verification purpose- Enrolment Phase and Authentication Phase. In enrolment phase user data is gathered, processed and stored in a database. This becomes a template for future authentication phase. In authentication phase, the user data is acquired and processed. A matching process is there to check the authenticity of the user based on his pre-stored reference templates.
Our fundamental objective is to generate a unique signature for each individual way by analyzing his/her typing behavior. The proposed system will capture user data on a continuous basis and it use the concept of free-text (i.e. no dedicated text to be provided by the user in order to create individual’s profile). In brief, the characteristics of the proposed model are:
-
1.
Keystrokes based continuous authentication.
-
2.
Dynamic (all text editor based data collection.
-
3.
Unique signature vector for each user.
The proposed logic has three sub-phases for identifying a user’s unique behavior, these are: data collection, Preprocessing of stored data and signature vector generation.
Our proposed system depicted in Fig. 2 focused on generating a unique typing behavior of each individual.
Here is a brief description of each sub-phase:
-
Data Acquisition: Here raw keystroke data of individuals are collected via various input devices. These may consist of normal computer keyboard, customized pressure sensitive keyboard, virtual keyboard etc [10, 11, 12]. The output of this phase is a text file of an individual’s typing behavior with key dwell time and key hold time.
-
Data Preparation: Pre-processing procedures such as feature selection, dimension reduction, and outlier detection [13] are to be applied to the collected samples prior to feature extraction to ensure or to increase the quality of feature data. A substantial number of data samples are collected for each individual.
-
Signature Vector Generator: The output of phase II is used as input in this phase. This file is used to generate a unique signature for each individual by applying some rules on the identified features and store them in database for future classification.
3.1 Data Acquisition
For the purpose of the work we have designed a routine to collect user data (key typing behavior). This routine aims to collect events generated by individuals (operators of computer systems) while using a keyboard. At present, the system works on the MS-Windows platform and does not require any additional libraries. The proposed logic works continuously in background and records a user’s activity associated with a keyboard. The events are captured on the fly and saved in text files user character [user_id, vi] in a database. A sample of collected input data is presented below.
Input data collection is carried out for each user user_id separately. We can represent each key event as a vector with 5 tuples. On ith Session the key pressed event represented with vector vi is as follows,
where, key_namei is the name of the ith key pressed event, naming convention is according to standard QWERTY keyboard interface on the session with Session_ID; hold_timei is the timestamp difference between key pressed and key released; dwell_timei is the timestamp difference between (i−1) th key release and ith key pressed; sys_timei is the system generated time in hour and minute when the event occur.
V is the composite vector {v1, v2 … vn}; n depends on the overall key press occur on each session on a single day. In practice we restrict the number of sample data collected from the user hence our database is a collection of SV = {V1, V2 …, Vm} where m is number of sample data collected for each uid.
Additionally we store the total number of BACKSPACE key-press during each session the user interact with his/her machine. The sample collected for each session for the BACKSPACE key can be described with a vector TB = {TB1, TB2 … TBl}; where L = number of sessions on a single day, andTBj = {sessionj, backspace_countj};
where backspace_countj is the total number of times BACKSPACE key is pressed in sessionj. Then we compute the average number of BACKSPACE key-press on a single day and store them into the database with day_id. The average number of the BACKSPACE key-press (AB) on kth day is calculated as follows;
All ABk will constitute a vector Ai = {uid, dayi, ABi} describe the average number of BACKSPACE key-press on ith day by the user uid (Table 1).
3.2 Data Preparation
In this phase we select unique features for generating individual signature. For this, key_hold_time and key_dwell_time are selected for analysis. We aim to generate a specific range for each key event for these two features.
Our database stored the collected sample in the form of vector SV = {V1, V2 …, Vm} where m is total number of session for each uid on a particular day. The preprocessing done on Vi, where Vi = {v1, v2 … vn}; n = number of key pressed on ith session.
We sort the key pressed event in a session and measure the maximum and minimum holding time of the key event (k). Store the range of key holding time and check for update on next sessions. Finally we get a list for each key_event (k) for Day (d) with specified range for user u_id and store them into database in the form of vector K_H {day_id, key_eventk, max_hold_timek, min_hold_timek}. max_hold_timek and min_hold_timek defines the range for key holding time for kth key_event on day day_id.
For key_dwell_time feature, we make a pairing between adjacent keys (k, k + 1) and store the pair-wise dwell time. In each session, we select the same pairs and list all the dwell_time values. This way, a range for all possible key-pairs is obtained for a day, and stored as vector K_D {day_id, key_pairj, max_dwell_timej, min_dwell_timej} per user (Tables 2 and 3).
3.3 Signature Vector Generator
In order to generate a template for individual uid we constructed a unique signature vector for each individual. Our feature space has 3 attributes (features); key_hold_time (kh), key_dwell_time (kd) and Backspace_key_count (bkc). For template creation we consider first two features from the feature space.
After the preprocessing of the input data stored in the form of K_H and K_D vector in our repository we proceed to generate a signature vector S_V for each user.
Avg_hold_time derived from max_hold_time, min_hold_timek \( \in \) K_H vector for \( \forall k \in Key \), Key comprises of all key event possessed by the user for the entire sample collection period. Similarly, max_dwell_timej, min_dwell_timej \( \in \) K_D used for obtaining Avg_dwell_time for \( \forall \)kp \( \in Key\_Pair \).
4 Experimental Results
We collected the data-sets from 10 participants for 10 days. The sample data-set collected for each individual shown in Table 4.1 based on session on a day. The users were asked to run our proposed application in background during the entire period of interaction with their dedicated machine. The sample data collected from different machine having different configuration.
The users were not bound to press any dedicated text string and there is no additional interface for capturing data. All the active windows accessed by the users were taken into consideration for generating sample data-set.
The collected samples for each user on a particular day then sorted in alphabetic order of key events. The processed samples depicted in Tables 4.2 and 4.3 for hold time and dwell time features respectively (Tables 5, 6, 7 and 8).
We differentiate the user behavior based on the two unique feature discussed so far, i.e., hold time and dwell time. [Tables 4.2 and 4.3] illustrate the comparative analysis of two user USER1 and USER2 depending on key Hold time and key dwell time feature (Figs. 3 and 4).
5 Conclusion
We have observed that the prevalent biometrics based techniques for identification of a legitimate user often suffered from high FAR and FPR rates, which had a negative effect on the respective accuracy rate. The study reveals a fact that most of the developed applications consider a dedicated text (mainly passwords of specific format) to be typed by the user. However, the fixed text examples failed to capture significant variations in individual typing due to limited characters used. In this paper, we have used free-text concept to solve this issue. The software for collecting user data is designed to be machine independent, and samples are collected from a varying set of computers. Our proposed signature vectors deal with all possible key events so that the aggregated behavior of the end user is stored in to the repository. Our future work will concentrate on the classification verification part of the individual based on these store templates.
References
Umpires, D., Williams, G.: Identity verification through keyboard characteristics. Int. J. Man-Mach. Stud. 23(3), 263–273 (1985)
Teh, P.S., Theo, A.B.J., Tee, C., Ong, T.S.: Keystroke dynamics in password authentication enhancement. Expert Syst. Appl. Int. J. 37, 8618–8627 (2010)
Monroe, F., Rubin, A.D.: Keystroke dynamics as a biometric for authentication. Elsevier- Future Gener. Comput. Syst. 16(4), 351–359 (2000)
Hong, Y., Deng, Y., Jai, A.K.: Keystroke dynamics for user authentication. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 117–123 (2012)
Revert, K., de Magadhães, S.T., Santos, H.: Data mining a keystroke dynamics based biometrics database using rough sets. University of Technology of Compiegne (2005)
Karnana, M., Akilab, M., Krishnarajc, N.: Biometric personal authentication using keystroke dynamics: a review. Appl. Soft Comput. 11(2), 1565–1573 (2011)
Wang, X., Fangxia, G., Jian-feng, M.: User authentication via keystroke dynamics based on difference subspace and slope correlation degree. Digit. Signal Process. 22(5), 707–712 (2012)
Babaeizadeh, M., Bakhtiari, M., Maarof, M.A.: Authentication method through keystrokes measurement of mobile users in cloud environment. Int. J. Advance Soft Comp. Appl. 6(3) (2014)
Hwang, S.-S., Lee, H.-J., Cho, S.: Improving authentication accuracy using artificial rhythms and cues for keystroke dynamics-based authentication. Expert Syst. Appl. ELSEVIER 36(7), 10649–10656 (2009)
Kotani, K., Horii, K.: Evaluation on a keystroke authentication system by keying force incorporated with temporal characteristics of keystroke dynamics. Behav. Inf. Technol. 24(4), 289–302 (2005)
Hwang, S.S., Cho, S., Park, S.: Keystroke dynamics-based authentication for mobile devices. Comput. Secur. 28(1–2), 85–93 (2009)
Nauman, M., Ali, T., Rauf, A.: Using trusted computing for privacy preserving keystroke-based authentication in smartphones. Telecommun. Syst. 52(4), 2149–2161 (2011)
Kaneko, Y., Kinpara, Y., Shiomi, Y.: A hamming distance-like filtering in keystroke dynamics. In: Proceedings of the 9th Annual International Conference on Privacy, Security and Trust (PST 2011), pp. 93–95 (2011)
Lee, W., Choi, S.-S., Moon, B.-R.: An evolutionary keystroke authentication based on ellipsoidal hypothesis space. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 2090–2097 (2007)
Hosseinzadeh, D., Krishnan, S.: Gaussian mixture modelling of keystroke patterns for biometric applications. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 38, 816–826 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2016 IFIP International Federation for Information Processing
About this paper
Cite this paper
Mukherjee, P., Chaki, R. (2016). Towards Generating a Unique Signature for Remote User by Keystrokes Dynamics. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science(), vol 9842. Springer, Cham. https://doi.org/10.1007/978-3-319-45378-1_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-45378-1_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45377-4
Online ISBN: 978-3-319-45378-1
eBook Packages: Computer ScienceComputer Science (R0)