Understanding users’ behavior with software operation data mining
Introduction
Software usage concerns the utilization of a software product by the end-users. Software usage data may be collected while the end-users are using the software in the field (El-Ramly & Stroulia, 2004). Simmons (2006) points out the possibility to extract system requirements from usage, rendering the beneficial role of user experience in product innovation and differentiation. Software usage knowledge includes the awareness of how end-users use the software in the field, and how the software itself responds to their actions (Van der Schuur, Jansen, & Brinkkemper, 2010).
By tracking software usage, we can monitor which applications are most often used, which features are underutilized, and which functionalities could be expanded (Junco, 2013). This information could for example be used to highlight changes in the requirements engineering process. We may also gain insights on how users browse themselves through the user interface in order to perform an operation, with the goal to improve software usability or to reengineer processes. Furthermore, by observing the usage behavior of different customer profiles, the software vendor can implement more directed marketing or customized licensing (Germanakos et al., 2008, Van der Schuur et al., 2010). Improved customer satisfaction, and consequently customer retention and increase in sales, are some of the business advantages that could be gained through an automated usage analysis, based on real execution data.
Software usage knowledge may be extracted from software operation data, i.e. data that are collected during software operation in the field (van der Schuur et al., 2010). A noticeable amount of research has already been performed in the process of recording software operation data (Bowring et al., 2002, Nusayr and Cook, 2009). In practice, most vendors tend to handle the acquired data manually, or use general statistics and simple visualization techniques (Kristjansson & Van der Schuur, 2009). However, such analysis cannot yield interesting patterns that are hidden in large datasets (Kantardzic, 2002).
On the other hand, a lot of development has been seen in the web usage mining field (Cooley, Mobasher, & Srivastava, 1997). Although many lessons can be learned from there, the approach for analyzing web usage by website visitors has significant differences, compared to analyzing how software products are used by the users. The techniques that are used in web usage mining (and other related domains) need to be revised for their application in mining usage on software operation data.
While usage knowledge is highly important for making good software products, the rise of cloud computing and Software-as-a-Service (SaaS) applications (Park & Ryoo, 2013) creates an opportunity to mine the easily acquired data. Even though there are algorithms for doing such data analysis, they are hardly ever used for analyzing software usage. Following a meta-algorithmic approach, we will try to answer the research question:
How should we inspect software operation data, in order to gain knowledge about how the software is used by the end-users?
This research suggests how data mining techniques can be integrated to analyze software operation data in a uniform and automated way. Hence, it contributes to the domain of software usage analysis as well as to the software operation knowledge and its use in software product management, development and maintenance processes (Van der Schuur et al., 2010). From a practical perspective, the method that we suggest for usage mining constitutes a reference process model that can be followed by software vendors, to analyze how their customers use their products.
The remainder of this paper has the following structure: In Section 2 we review the research that has been performed on the area of extracting usage knowledge from the system utilization. We shortly present our research design in Section 3. In Section 4 we present the method that has been constructed to extract usage knowledge. In Section 5 we describe the usage knowledge subjects that we suggest to extract, and the variables that should be inspected in software operation data, in order to derive conclusions about how software operates in the field. Section 6 describes the data mining techniques that are suggested for mining software usage knowledge. In Section 7 we present the prototype that was constructed as an instantiation of the usage mining method. We evaluate the two artifacts in a case study in Section 8. Finally, in Section 9 we discuss the insights from this research and provide some general conclusions.
Section snippets
Related work
As far as specific research on software usage analysis is concerned, extraction of in-the-field usage knowledge remains an area that needs a lot of enrichment. Data analysis techniques have been previously applied to this field: for software reengineering purposes (El-Ramly et al., 2009, Lefngwell and Widrig, 2003), for program comprehension (Zaidman, Calders, Demeyer, & Paredaens, 2005), for re-documentation of use cases (Smit, Stroulia, & Wong, 2008), or for user interface learning agents (
Research design
The users’ shift to cloud computing applications (Park & Ryoo, 2013) creates the opportunity for software vendors to automatically collect vast amounts of usage data. Although several algorithms have been developed to analyze the behavior of website visitors, they are hardly ever used in the software products domain. This research aims to follow a meta-algorithmic approach, by incorporating the state-of-the-art data mining techniques in a method. Our goal is to show how the appropriate
Usage Mining Method
In this section we present the first design artifact that we constructed in this research. The Usage Mining Method suggests an ordered set of activities that should be followed to extract relevant usage knowledge from software operation data.
In order to provide guidance in analyzing software product users’ usage behavior, we propose the Usage Mining Method (Fig. 2). The method has been constructed with the Method Engineering approach, provided by van de Weerd and Brinkkemper (2008). The method
Software usage knowledge
In this section we suggest what types of knowledge should be extracted from software operation data to gain insights about how the end-users are using a software product. Subsequently, we present the fundamental variables that should be inspected during software operation, in order to gather the data that are necessary to analyze usage.
Based on our findings from our literature research in the domains of usage analysis in software systems (El-Ramly et al., 2009, Simmons, 2006) and web usage
Usage mining tasks and data mining techniques
In this section, we are going to suggest which data mining techniques could be performed on the software operation data, in order to analyze the software usage. More specifically, in order to produce the various usage knowledge types presented in the previous section, we suggest the following usage analysis tasks:
- 1.
Classification Analysis, to understand the factors which influence the decisions that customers take, in the context of the software product utilization.
- 2.
Users Profiling, i.e.
A prototype for usage mining
The Usage Mining Method presented in Section 4 is instantiated in a prototype, which we developed in R (R Development Core Team, 2008) and implements the method’s activities. The prototype can be used to analyze the software usage of SaaS products with embedded logging procedures that record the operation data. The prototype has the format of an R script, which performs successively the activities of Data Preparation, Exploratory Analysis, Classification Analysis, Users Profiling and
Case study
Following the Design Science Research approach, we just presented the two design artifacts that we constructed in this research: the Usage Mining Method and the prototype developed in R. In order to evaluate the two artifacts, we performed a case study in a Dutch software company, to implement the Usage Mining Method and run the prototype in the context of a real software product. This section comprises the design of the case study, as well as the execution and interpretation of the results.
Discussion
In this paper we have investigated how we can inspect software operation data, in order to gain knowledge about how the software is used by the end-users. We reviewed related literature on software usage analysis. We constructed and presented a method that could be used to analyze how the end-users are using a software product. We explicated this knowledge by distinguishing four different categories (statistical summaries of sessions and users’ behavior, factors that influence the customers’
References (57)
- et al.
Capturing essential intrinsic user behaviour values for the design of comprehensive web-based personalized environments
Computers in Human Behavior
(2008) Comparing actual and self-reported measures of Facebook use
Computers in Human Behavior
(2013)- et al.
Applying social bookmarking to collective information searching (CIS): An analysis of behavioral pattern and peer interaction for co-exploring quality online resources
Computers in Human Behavior
(2011) Lessons learned from i-mode: What makes consumers click wireless banner ads?
Computers in Human Behavior
(2007)- et al.
An empirical investigation of end-users’ switching toward cloud computing: A two factor theory perspective
Computers in Human Behavior
(2013) - et al.
What are participants doing while filling in an online questionnaire: A paradata collection tool and an empirical study
Computers in Human Behavior
(2010) - et al.
Process mining: A research agenda
Computers in Industry
(2004) - et al.
Monitoring deployed software using software tomography
SIGSOFT Software Engineering Notes
(2002) - et al.
Classification and regression trees
(1984) - et al.
Clvalid: An r package for cluster validation
Journal of Statistical Software
(2008)
Web mining: Information and pattern discovery on the world wide web
Data preparation for mining world wide web browsing patterns
Knowledge and Information Systems
Probabilistic networks and expert systems: Exact computational methods for Bayesian networks
Discovering web service workflows using web services interaction mining
International Journal of Business Process Integration and Management
Legacy systems interaction reengineering
Cluster analysis
On the handling of continuous-valued attributes in decision tree generation
Machine Learning
Discovering statistics using SPSS
Applied data mining: Statistical methods for business and industry
Introduction to probability
Data mining: Concepts and techniques
Hierarchical clustering
Neural networks: A comprehensive foundation
Design science in information systems research
MIS Quarterly
Data clustering: A review
ACM Computing Surveys
An analysis of usage of a digital library
Cited by (37)
From user-generated data to data-driven innovation: A research agenda to understand user privacy in digital markets
2021, International Journal of Information ManagementCitation Excerpt :Strategies focused on large-scale data automation and DDI must be standardized and examined to avoid abuse that could harm user privacy and data. The application of DDI and BDA to the study of online user behavior has been studied from behavioral (Pachidi, Spruit, & Van De Weerd, 2014) and marketing perspectives (Vinerean, Cetina, Dumitrescu, & Tichindelean, 2013; Palos-Sanchez et al., 2019). However, these analytical approaches have allowed tracking users online, allowing thise companies to anticipate user decisions and understand how users behave on the Internet (Steinfeld, 2016; Tene & Polenetsky, 2012).
Bridging the information gap of disaster responders by optimizing data selection using cost and quality
2018, Computers and GeosciencesCitation Excerpt :Furthermore, it is much more difficult to do this kind of mapping between organizations and even more so if certain work flows are still paper based. It might be possible to log data file usage on the main websites that are used by responders and for example how the app and dashboard are used (Pachidi et al., 2014). In addition, an after-action review with the responders in a focus group setting could be used to have the responders categorize their needs according to the four phases.
Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients
2018, Telematics and InformaticsIntroducing continuous experimentation in large software-intensive product and service organisations
2017, Journal of Systems and SoftwareCitation Excerpt :Similarly, there are examples of instrumenting software running locally on users' devices and analysing the resulting data to gain insights on, e.g., performance issues (Han et al., 2012). Pachidi et al. (2014) propose a method to guide the analysis of data collected during software operation, using three different data mining techniques to produce a classification analysis, user profiling, and clickstream analysis to support decision-making. Whereas data mining can be performed in an exploratory manner without many up-front assumptions, an experiment-driven approach focuses on testing important assumptions about a software product or service.
A comprehensive study on the effects of using data mining techniques to predict tie strength
2016, Computers in Human BehaviorCitation Excerpt :Users' interactions are modeled in this framework by a social graph generation technique, in which, ties between a pair of nodes are established when they participate in at least one group-chat session. Pachidi, Spruit, and van de Weerd (2014) presented a usage mining method to analyze collected data from software operations, in order to understand how a software product is used by the end-users. Users profiling, click-stream analysis and classification analysis are three types of analysis which are employed by this method.
The sociability score: App-based social profiling from a healthcare perspective
2016, Computers in Human BehaviorCitation Excerpt :Based on the results, the experts will provide their opinions on the satisfactory level that the method provides for health care professionals as a part of the last evaluation phase. For the processing and analysis of the data in preparation of determining the sociability score, several different tools were used, in line with the Usage Mining Method of Pachidi, Spruit, Van der Weerd, (2014). First, all data collected by the BeHapp application was sent to and stored in a MySQL database with access through phpMiniAdmin.