Abstract
Web reliability is gaining importance with time due to the exponential increase in the popularity of different social community networks, mailing systems and other online applications. Hence, to enhance the reliability of any existing web system, the web administrators must have the knowledge of various web errors present in the system, influences of various workload characteristics on the manifestation of several web errors and the relations among different workload characteristics. But in reality, often it may not be possible to institute a generalized correspondence among several workload characteristics. Moreover, the issues like the prediction and estimation of the cumulative occurrences of the source content failures and the corresponding time between failures of a web system become less highlighted by the reliability research community. Hence, in this work, the authors have presented a well-defined procedure (a forecasting framework) for the web admins to analyze and enhance the reliability of the web sites under their supervision. Initially, it takes the HTTP access and the error logs to extract all the necessary information related to the workloads, web errors and corresponding time between failures. Next, we have performed the principal component analysis, correlation analysis and the change point analysis to select the number of independent variables. Next, we have developed various time series based forecasting models for foretelling the cumulative occurrences of the source content failures and the corresponding time between failures. In the current work, the multivariate models also include various uncorrelated workloads, the exogeneous and the endogenous noises for forecasting the web errors and the corresponding time between failures. The proposed methodology has been validated with usage statistics collected from the web sites belong of two highly renowned Indian academic institutions.
Similar content being viewed by others
Notes
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control 19:716–723
Amin A, Grunske L, Colman A (2013) An approach to software reliability prediction based on time series modeling. Journal of Systems and Software 86(7):1923–1932
Anselmo V, Ubertini L (1979) Transfer function-noise model applied to flow forecasting. Hydrol Sci 24:353–359
Arlitt MF, Williamson CL (1997) Internet Web Servers: Workload Characterization and Performance Implications. IEEE/ACM Trans. Networking 5:631–645
Armstrong JS (1985) Long-Range Forecasting. Wiley, New York
Armstrong S, Collopy F (1992) Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons. International Journal of Forecasting 8:69–80
Bai C, Hu Q, Xie M, Ng S (2005) Software failure prediction based on a Markov Bayesian network model. Journal of Systems and Software 74(3):275–282
Barghout M, Littlewood B, Abdel-Ghaly A (1998) A non-parametric order statistics software reliability model. Software Testing, Verification and Reliability 8(3):113–132
Bishop C (1991) Improving the generalization properties of radial basis function neural networks. Neural computation 3(4):579–588
Bontempi G, Taieb SB, Le Borgne YA (2013) Machine learning strategies for time series forecasting. Business Intelligence, Springer Berlin, Heidelberg, pp 62–77
Box GPE, Jenkins GM (1976) Time series analysis, forecasting, and control. Holden-Day, San Francisco
Breiman L (2001) Statistical modeling: the two cultures. Statistical Science 16:199–231
Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks (No. RSRE-MEMO-4148). Royal Signals and Radar Establishment Malvern, United Kingdom
Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27:1065–1073
Chatfield C (1988) Apples, oranges and mean square error. International Journal of Forecasting 4:515–518
Chatterjee S, Roy A (2014) Transfer Function Modeling in Web Software Fault Prediction Implementing Pre-Whitening Technique. International Journal of Reliability Quality and Safety Engineering 21:5
Chatterjee S, Roy A (2015) Novel algorithms for web software fault prediction. Quality and Reliability Engineering International 31(8):1517–1535
Chatterjee S, Misra RB, Alam SS (1997a) Joint effect of test effort and learning factor on software reliability and optimal release policy. International Journal of System Science 28:391–396
Chatterjee S, Misra RB, Alam SS (1997b) Prediction of software reliability using an auto regressive process. International Journal of System Science 28:205–211
Chatterjee S, Nigam S, Singh JB, Upadhayaya LN (2011a) Transfer function modeling in software reliability. Computing 92:33–48
Chatterjee S, Singh JB, Nigam S, Upadhayaya LN (2011b) Best subset selection of ARMA and ARIMA models for software reliability estimation. International journal of Modeling and Simulation 31(2):120–125
Chatterjee S, Singh JB, Roy A (2015) A structure-based software reliability allocation using fuzzy analytic hierarchy process. Int J Syst Sci 46(3):513–525
Chatterjee S, Nigam S, Roy A (2016) Software fault prediction using neuro-fuzzy network and evolutionary learning approach. Neural Computing and Applications. doi:10.1007/s00521-016-2437-y
Chen SM (1996) Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst 81(3):311–319
Chen SM, Chung NY (2006) Forecasting enrolments using high order fuzzy time series and genetic algorithms. Int J Intell Syst 21:485–501
Chen SM, Tanuwijaya K (2011) Multivariate fuzzy forecasting based on fuzzy time series and automatic clustering techniques. Expert Syst with Appl 38:10594–10650
Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Statistical Science 9:2–54
Csermely P (2009) Weak links. Springer
Davari S, Zarandi MHF, Turksen IB (2009) An Improved fuzzy time series forecasting model based on particle swarm intervalization. The 28th North American Fuzzy Information Processing Society Annual Conferences (NAFIPS 2009), Cincinnati, June 14-17
Eğrioglu E, Aladag CH, Yolcu U, Uslu VR, Basaran MA (2010) Finding an optimal interval length in high order fuzzy time series. Expert Systems with Applications 37:5052–5055
Eğrioglu E, Aladag CH, Başaran MA, Uslu VR, Yolcu U (2011) A New Approach Based on the Optimization of the Length of Intervals in Fuzzy Time Series. Journal of Intelligent and Fuzzy Systems 22:15–19
Espinha T, Zaidman A, Gross HG (2015) Web API growing pains: Loosely coupled yet strongly tied. Journal of Systems and Software 100:27–43
Eubank RL (1988) Spline Smoothing and Nonparametric Regression of Statistics, Textbooks and Monographs, vol. 90. Marcel Dekker
Falát L (2016) Time Series Forecasting with Hybrid Neural Networks and Advanced Statistical Methods. Information Sciences and Technologies 8(1):33
Fenton N, Neil M, Marquez D (2008) Using Bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 222(4):701–712
Goel A, Okumoto K (1979) A time-dependent error-detection rate model for software reliability and other performance measures. IEEE Transactions on Reliability 28(3):206–211
Hand DJ (2000) Data mining, new challenges for statisticians. Social Science Computer Review 18(4):442–449
Haykin S (1999). Neural Networks: A Comprehensive Foundation, (2nd ed.). Prentice Hall, Upper Saddle River. ISBN 0-13-908385-5
Hsu LY, Horng SJ, Kao TW, Chen YH, Run RS, Chen RJ, Lai JL, Kuo IH (2010) Temperature prediction and TAIFEX forecasting based on fuzzy relationships and MTPSO techniques. Expert Systems with application 37:2756–2770
Huang YL, Horng SJ, Kao TW, Run RS, Lai JL, Chen RJ, Kuo IH (2011) An improved forecasting model based on the weighted fuzzy relationship matrix combined with a pso adaptation for enrollments. Journal of Innovative Computing, Information and Control 7:4027–4045
Huarng K (2001) Effective length of intervals to improve forecasting in fuzzy timeSeries. Fuzzy Sets and Systems 123:387–394
Huynh T, Miller J (2009) Another viewpoint on evaluating web software reliability based on workload and failure data extracted from server logs. Empirical Software Engineering 2009(14):371–396
Jo T (2013) VTG schemes for using back propagation for multivariate time series prediction. Applied Soft Computing 13:2692–2720
Jolliffee IT (1986) Principal component analysis. Springer, New York
Junhong G, Hongwei L, Xiaozong Y (2005) An autoregressive time series software reliability growth model with independent increment. Proceedings of the International Conference on Mathematical Methods and Computational Techniques In Electrical Engineering, WSEAS, pp 362–366
Kallepalli C, Tian J (2001) Measuring and Modeling Usage and Reliability for Statistical Web Testing. IEEE Trans. On Software Engineering 27:1023–1036
Karlaftis MG, Vlahogianni EI (2011) Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transportation Research Part C: Emerging Technologies 19(3):387–399
Keivanloo I, Rilling J (2014) Software trustworthiness 2.0— A semantic web enabled global source code analysis approach. Journal of Systems and Software 89:33–50
Kini BV, Chandra Sekhar C (2013) Large margin mixture of AR models for time series classification. Applied Soft Computing 13:361–371
Kuo IH, Horng SJ, Kao TW, Lin TL, Lee CL, Pan Y (2009) An improved method for forecasting enrollments based on fuzzy time series and particle swarm optimization. Expert Systems with Application 36:6108–6117
Kuo IH, Horng SJ, Chen YH, Run RS, Kao TW, Chen RJ, Lai JL, Lin TL (2010) Forecasting TAIFEX based on fuzzy time series and particle swarm optimization. Expert Systems with Application 37:1494–1502
Lai PW (1979) Transfer function modeling relationship between time series variables. Mid Anglia Litho
Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory, Los Alamos
Lee LW, Wang LH, Chen SM (2007) Temperature prediction and TAIFEX forecasting based on fuzzy logical relationships and genetic algorithms. Expert Systems with Applications 33:539–550
Lee LW, Wang LH, Chen SM (2008) Temperature prediction and TAIFEX forecasting based on high-order fuzzy logical relationships and genetic simulated annealing techniques. Expert Systems with Applications 34:328–336
Lund R, Wang XI, Lu QQ, Reeves J, Gallagher C, Feng Y (2007) Change point Detection in Periodic and Autocorrelated Time Series. J. Climate 20:5178–5190
Lutkepohl H (2005) New introduction to multiple time series analysis. Springer, Berlin
Lyu MR (1996) Handbook of Software Reliability Engineering. IEEE Computer Society Press, McGraw Hill, New York
Lyu M, Nikora A (1992) Applying reliability models more effectively. IEEE Software 9(4):43–52
Ma L, Tian J (2007) Web Error Classification and Analysis for Reliability Improvement. J Syst Softw 80(6):795–804
Martínez Y, Cachero C, Meliá S (2014) Empirical study on the maintainability of Web applications: Model-driven Engineering vs Code-centric. Empirical Software Engineering 19:1887–1920. doi:10.1007/s10664-013-9269-5
Maurer C, Peterka JR (2005) A new interpretation of spontaneous sway measures based on a simple model of human postural control. Journal of Neurophysiology 93(1):189–200
Moura M, Zio E, Didier Lins I, Droguett E (2011) Failure and reliability prediction by support vector machines regression of time series data. Reliability Engineering and System Safety 96(11):1527–1534
Musa JD, Iannino A, Okumoto K (1987) Software Reliability Measurement, Prediction, Application, Int. Ed. McGraw-Hill
Offutt J (2002) Quality Attributes of Web Software Applications. IEEE Software 2002(19):25–32
Offutt J, Papadimitriou V, Praphamontripong U (2014) A case study on bypass testing of web applications. Empirical Software Engineering 19:69–104. doi:10.1007/s10664-012-9216-x
Park JH (2013) Multiple-index approach to multiple autoregressive time series model. Statistics and Computing 23:201–208
Park JI, Lee DJ, Song CK, Chun MG (2010) TAIFEX and KOSPI 200 forecasting based on two factors high order fuzzy time series and particle swarm optimization. Expert Systems with Application 37:959–967
Pearson J, Pearson A, Green D (2007) Determining the Importance of Key Criteria in Web Usability. Management Research News 30(11):816–828
Pena D, Sanchez I (2007) Measuring the Advantages of Multivariate vs. Univariate Forecasts. Journal of Time Series Analysis 28:6
Pham H (1995) Software Reliability and testing. Wiley-IEEE Computer Society Press, ISBN:978-0-8186-6852-4
Pham H (2006) System Software Reliability. Springer-Verlag, London
Popstojanova KS, Singh AD, Mazimdar S, Li F (2006) Empirical Characterization of Session-Based Workload and Reliability for Web Servers. Empire Software Eng. 11:71–117
Qu L, Chen Y, Liu Z (2006, June) Time series forecasting model with error correction by structure adaptive RBF neural network. In Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on, IEEE, vol. 2, pp 6831–6835
Robinson D, Dietrich D (1987) A new nonparametric growth model. IEEE Transactions on Reliability 36(4):411–418
Roy A (2016) A novel multivariate fuzzy time series based forecasting algorithm incorporating the effect of clustering on prediction. Soft Computing 20(5):1991–2019
Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. International Journal of Climatology 33(2):520–528
Schneidewind NF (2012) Computer, Network, Software, and Hardware Engineering with Application. Wiley
Shao J (1997) An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7:221–242
Sharma K, Garg R, Nagpal C, Garg R (2010) Selection of optimal software reliability growth models using a distance based approach. IEEE Transactions on Reliability 59(2):266–276
Shumway HR, Stoffer SD (2008) Time series analysis and its applications. Springer
Singpurwalla ND (1980) Analyzing availability using transfer function models and cross spectral analysis. Naval Research Logist Quart 27:1–16
Singpurwalla ND, Soyer R (1985) Assessing (Software) Reliability Growth Using a Random Coefficient Autoregressive Process and Its Ramifications. IEEE Trans Softw Eng 11(12):1456–1464
Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy time series—part I. Fuzzy Sets Syst 54(1):1–9. doi:10.1016/0165-0114(93)90355-L
Song Q, Chissom BS (1994) Forecasting enrollments with fuzzy time series—part II. Fuzzy Sets Syst 62:1–8. doi:10.1016/0165-0114(94)90067-1
Sosinsky B (2009) Networking Bible. Wiley
Suparta W, Alhasa KM (2013) A comparison of ANFIS and MLP models for the prediction of precipitable water vapor. 2013 I.E. international conference on space science and communication (IconSpace), pp 243–248
Tanenbaum AS (2011) Computer Networks. Pearson, India
Tarafdar M, Zhang J (2005) Analyzing the Influence of Website Design Parameters on Website Usability. Information Resources Management Journal 18(4):62–80
Tian J (2002) Better Reliability Assessment and Prediction through Data Clustering. IEEE Trans. On Software Engineering 28:997–1007
Tian J, Rudraraju S, Li Z (2004) Evaluating Web Software Reliability Based on Workload and Failure Data Extracted from Server Logs. IEEE Trans Softw Eng 30(11):754–769
Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology 49(11):1225–1231
Walls LA, Bendell A (1987) Time series methods in reliability. Reliab. Eng. & Syst. Safety 18:239–265
Werbos PJ (1974) Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge
Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clin Res 30(1):79–82
Wiper M, Palacios A, Marín J (2012) Bayesian software reliability prediction using software metrics information. Quality Technology and Quantitative Management 9(1):35–44
Xie M (1991) Software Reliability Modeling. World Scientific Press, London
Xie M, Ho S (1999) Analysis of repairable system failure data using time series models. Journal of Quality in Maintenance Engineering 5(1):50–61
Xie M, Hong G, Wohlin C (1997) A study of the exponential smoothing technique in software reliability growth prediction. Quality and Reliability Engineering International 13(6):347–353
Yamada S (2014) Software Reliability Modeling Fundamentals and Applications. Springer, ISBN: 978-4-431-54564-4
Yang Y (2003) Can the Strengths of AIC and BIC Be Shared? Biometrika 92(4):937–950
Yang B, Li X, Xie M, Tan F (2010) A generic data-driven software reliability model with model mining technique. Reliability Engineering and System Safety 95(6):671–678
Zaidi S, Danial S, Usmani B (2008) Modeling inter-failure time series using neural networks. IEEE International Multitopic Conference, pp 409–411
Zou H, Yang Y (2004) Combining time series models for forecasting. Int J Forecast 20(1):69–84
Acknowledgements
The authors are thankful to the National University of Singapore, Singapore University of Technology and Design (collaborated with the MIT, USA) and The State University of New Jersey, Rutgers for providing excellent environment for completing this work. The constructive comments of the extremely learned associate editor and three enlightened anonymous reviewers are also gratefully acknowledged. The authors would like to express their heartfelt gratitude to Prof. Subhashis Chatterjee, Mr. Rajesh Mishra (Indian Institute of Technology Dhanbad), Prof. Amitava Dutta, Mr. Subhashis Kumar Pal, Mr. Ashish Biswas (Indian Statistical Institute) for providing the necessary data and valuable ideas.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Hélène Waeselynck
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Roy, A., Pham, H. Toward the development of a conventional time series based web error forecasting framework. Empir Software Eng 23, 570–644 (2018). https://doi.org/10.1007/s10664-017-9530-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9530-4