Skip to main content
Log in

Toward the development of a conventional time series based web error forecasting framework

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Web reliability is gaining importance with time due to the exponential increase in the popularity of different social community networks, mailing systems and other online applications. Hence, to enhance the reliability of any existing web system, the web administrators must have the knowledge of various web errors present in the system, influences of various workload characteristics on the manifestation of several web errors and the relations among different workload characteristics. But in reality, often it may not be possible to institute a generalized correspondence among several workload characteristics. Moreover, the issues like the prediction and estimation of the cumulative occurrences of the source content failures and the corresponding time between failures of a web system become less highlighted by the reliability research community. Hence, in this work, the authors have presented a well-defined procedure (a forecasting framework) for the web admins to analyze and enhance the reliability of the web sites under their supervision. Initially, it takes the HTTP access and the error logs to extract all the necessary information related to the workloads, web errors and corresponding time between failures. Next, we have performed the principal component analysis, correlation analysis and the change point analysis to select the number of independent variables. Next, we have developed various time series based forecasting models for foretelling the cumulative occurrences of the source content failures and the corresponding time between failures. In the current work, the multivariate models also include various uncorrelated workloads, the exogeneous and the endogenous noises for forecasting the web errors and the corresponding time between failures. The proposed methodology has been validated with usage statistics collected from the web sites belong of two highly renowned Indian academic institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Arlitt and Williamson 1997, Kallepalli and Tian 2001, Offutt 2002, Offutt et al. 2014, Tian et al. 2004, Popstojanova et al. 2006, Catledge and Pitkow 1995, Ma and Tian 2007, Martínez et al. 2014, Keivanloo and Rilling 2014, Espinha et al. 2015, Chatterjee and Roy 2014

  2. Schneidewind 2012, Musa et al. 1987, Tian 2002, Walls and Bendell 1987, Lyu 1996, Xie 1991, Singpurwalla and Soyer 1985, Singpurwalla 1980, Chatterjee et al. 1997a, b, Pham 2006, Chatterjee et al. 2011, Chatterjee and Roy 2014.

  3. http://royal.pingdom.com/2009/05/06/the-5-most-common-http-errors-according-to-google/

  4. https://blog.hubspot.com/blog/tabid/6307/bid/33766/10-clever-website-error-messages-from-creative-companies.aspx#sm.0001i169ps1ulcypxuu1ubyl5bej2

  5. Box and Jenkins 1976, Shumway and Stoffer 2008, Jolliffee 1986, Lutkepohl 2005, Park 2013, Kini and Chandra Sekhar 2013, Zou and Yang 2004, Pena and Sanchez 2007.

  6. Box and Jenkins 1976, Shumway and Stoffer 2008, Jolliffee 1986, Lutkepohl 2005, Anselmo and Ubertini 1979, Lai 1979, Jo 2013, Chatterjee and Roy 2014.

  7. Tanenbaum 2011, Sosinsky 2009, Csermely 2009.

  8. Arlitt and Williamson 1997, Kallepalli and Tian 2001, Offutt 2002, Offutt et al. 2014, Tian et al. 2004, Popstojanova et al. 2006, Catledge and Pitkow 1995, Ma and Tian 2007.

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control 19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • Amin A, Grunske L, Colman A (2013) An approach to software reliability prediction based on time series modeling. Journal of Systems and Software 86(7):1923–1932

    Article  Google Scholar 

  • Anselmo V, Ubertini L (1979) Transfer function-noise model applied to flow forecasting. Hydrol Sci 24:353–359

    Article  Google Scholar 

  • Arlitt MF, Williamson CL (1997) Internet Web Servers: Workload Characterization and Performance Implications. IEEE/ACM Trans. Networking 5:631–645

    Article  Google Scholar 

  • Armstrong JS (1985) Long-Range Forecasting. Wiley, New York

    Google Scholar 

  • Armstrong S, Collopy F (1992) Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons. International Journal of Forecasting 8:69–80

    Article  Google Scholar 

  • Bai C, Hu Q, Xie M, Ng S (2005) Software failure prediction based on a Markov Bayesian network model. Journal of Systems and Software 74(3):275–282

    Article  Google Scholar 

  • Barghout M, Littlewood B, Abdel-Ghaly A (1998) A non-parametric order statistics software reliability model. Software Testing, Verification and Reliability 8(3):113–132

    Article  Google Scholar 

  • Bishop C (1991) Improving the generalization properties of radial basis function neural networks. Neural computation 3(4):579–588

    Article  Google Scholar 

  • Bontempi G, Taieb SB, Le Borgne YA (2013) Machine learning strategies for time series forecasting. Business Intelligence, Springer Berlin, Heidelberg, pp 62–77

  • Box GPE, Jenkins GM (1976) Time series analysis, forecasting, and control. Holden-Day, San Francisco

    MATH  Google Scholar 

  • Breiman L (2001) Statistical modeling: the two cultures. Statistical Science 16:199–231

    Article  MathSciNet  MATH  Google Scholar 

  • Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks (No. RSRE-MEMO-4148). Royal Signals and Radar Establishment Malvern, United Kingdom

  • Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27:1065–1073

    Article  Google Scholar 

  • Chatfield C (1988) Apples, oranges and mean square error. International Journal of Forecasting 4:515–518

    Article  Google Scholar 

  • Chatterjee S, Roy A (2014) Transfer Function Modeling in Web Software Fault Prediction Implementing Pre-Whitening Technique. International Journal of Reliability Quality and Safety Engineering 21:5

    Article  Google Scholar 

  • Chatterjee S, Roy A (2015) Novel algorithms for web software fault prediction. Quality and Reliability Engineering International 31(8):1517–1535

    Article  Google Scholar 

  • Chatterjee S, Misra RB, Alam SS (1997a) Joint effect of test effort and learning factor on software reliability and optimal release policy. International Journal of System Science 28:391–396

    Article  MATH  Google Scholar 

  • Chatterjee S, Misra RB, Alam SS (1997b) Prediction of software reliability using an auto regressive process. International Journal of System Science 28:205–211

    MATH  Google Scholar 

  • Chatterjee S, Nigam S, Singh JB, Upadhayaya LN (2011a) Transfer function modeling in software reliability. Computing 92:33–48

    Article  MathSciNet  MATH  Google Scholar 

  • Chatterjee S, Singh JB, Nigam S, Upadhayaya LN (2011b) Best subset selection of ARMA and ARIMA models for software reliability estimation. International journal of Modeling and Simulation 31(2):120–125

    Google Scholar 

  • Chatterjee S, Singh JB, Roy A (2015) A structure-based software reliability allocation using fuzzy analytic hierarchy process. Int J Syst Sci 46(3):513–525

  • Chatterjee S, Nigam S, Roy A (2016) Software fault prediction using neuro-fuzzy network and evolutionary learning approach. Neural Computing and Applications. doi:10.1007/s00521-016-2437-y

  • Chen SM (1996) Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst 81(3):311–319

    Article  Google Scholar 

  • Chen SM, Chung NY (2006) Forecasting enrolments using high order fuzzy time series and genetic algorithms. Int J Intell Syst 21:485–501

  • Chen SM, Tanuwijaya K (2011) Multivariate fuzzy forecasting based on fuzzy time series and automatic clustering techniques. Expert Syst with Appl 38:10594–10650

    Article  Google Scholar 

  • Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Statistical Science 9:2–54

    Article  MathSciNet  MATH  Google Scholar 

  • Csermely P (2009) Weak links. Springer

  • Davari S, Zarandi MHF, Turksen IB (2009) An Improved fuzzy time series forecasting model based on particle swarm intervalization. The 28th North American Fuzzy Information Processing Society Annual Conferences (NAFIPS 2009), Cincinnati, June 14-17

  • Eğrioglu E, Aladag CH, Yolcu U, Uslu VR, Basaran MA (2010) Finding an optimal interval length in high order fuzzy time series. Expert Systems with Applications 37:5052–5055

    Article  Google Scholar 

  • Eğrioglu E, Aladag CH, Başaran MA, Uslu VR, Yolcu U (2011) A New Approach Based on the Optimization of the Length of Intervals in Fuzzy Time Series. Journal of Intelligent and Fuzzy Systems 22:15–19

    MATH  Google Scholar 

  • Espinha T, Zaidman A, Gross HG (2015) Web API growing pains: Loosely coupled yet strongly tied. Journal of Systems and Software 100:27–43

    Article  Google Scholar 

  • Eubank RL (1988) Spline Smoothing and Nonparametric Regression of Statistics, Textbooks and Monographs, vol. 90. Marcel Dekker

  • Falát L (2016) Time Series Forecasting with Hybrid Neural Networks and Advanced Statistical Methods. Information Sciences and Technologies 8(1):33

    Google Scholar 

  • Fenton N, Neil M, Marquez D (2008) Using Bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 222(4):701–712

    Google Scholar 

  • Goel A, Okumoto K (1979) A time-dependent error-detection rate model for software reliability and other performance measures. IEEE Transactions on Reliability 28(3):206–211

    Article  MATH  Google Scholar 

  • Hand DJ (2000) Data mining, new challenges for statisticians. Social Science Computer Review 18(4):442–449

    Article  Google Scholar 

  • Haykin S (1999). Neural Networks: A Comprehensive Foundation, (2nd ed.). Prentice Hall, Upper Saddle River. ISBN 0-13-908385-5

  • Hsu LY, Horng SJ, Kao TW, Chen YH, Run RS, Chen RJ, Lai JL, Kuo IH (2010) Temperature prediction and TAIFEX forecasting based on fuzzy relationships and MTPSO techniques. Expert Systems with application 37:2756–2770

    Article  Google Scholar 

  • Huang YL, Horng SJ, Kao TW, Run RS, Lai JL, Chen RJ, Kuo IH (2011) An improved forecasting model based on the weighted fuzzy relationship matrix combined with a pso adaptation for enrollments. Journal of Innovative Computing, Information and Control 7:4027–4045

    Google Scholar 

  • Huarng K (2001) Effective length of intervals to improve forecasting in fuzzy timeSeries. Fuzzy Sets and Systems 123:387–394

    Article  MathSciNet  MATH  Google Scholar 

  • Huynh T, Miller J (2009) Another viewpoint on evaluating web software reliability based on workload and failure data extracted from server logs. Empirical Software Engineering 2009(14):371–396

    Article  Google Scholar 

  • Jo T (2013) VTG schemes for using back propagation for multivariate time series prediction. Applied Soft Computing 13:2692–2720

    Article  Google Scholar 

  • Jolliffee IT (1986) Principal component analysis. Springer, New York

    Book  Google Scholar 

  • Junhong G, Hongwei L, Xiaozong Y (2005) An autoregressive time series software reliability growth model with independent increment. Proceedings of the International Conference on Mathematical Methods and Computational Techniques In Electrical Engineering, WSEAS, pp 362–366

  • Kallepalli C, Tian J (2001) Measuring and Modeling Usage and Reliability for Statistical Web Testing. IEEE Trans. On Software Engineering 27:1023–1036

    Article  Google Scholar 

  • Karlaftis MG, Vlahogianni EI (2011) Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transportation Research Part C: Emerging Technologies 19(3):387–399

    Article  Google Scholar 

  • Keivanloo I, Rilling J (2014) Software trustworthiness 2.0— A semantic web enabled global source code analysis approach. Journal of Systems and Software 89:33–50

    Article  Google Scholar 

  • Kini BV, Chandra Sekhar C (2013) Large margin mixture of AR models for time series classification. Applied Soft Computing 13:361–371

    Article  Google Scholar 

  • Kuo IH, Horng SJ, Kao TW, Lin TL, Lee CL, Pan Y (2009) An improved method for forecasting enrollments based on fuzzy time series and particle swarm optimization. Expert Systems with Application 36:6108–6117

    Article  Google Scholar 

  • Kuo IH, Horng SJ, Chen YH, Run RS, Kao TW, Chen RJ, Lai JL, Lin TL (2010) Forecasting TAIFEX based on fuzzy time series and particle swarm optimization. Expert Systems with Application 37:1494–1502

    Article  Google Scholar 

  • Lai PW (1979) Transfer function modeling relationship between time series variables. Mid Anglia Litho

  • Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory, Los Alamos

  • Lee LW, Wang LH, Chen SM (2007) Temperature prediction and TAIFEX forecasting based on fuzzy logical relationships and genetic algorithms. Expert Systems with Applications 33:539–550

    Article  Google Scholar 

  • Lee LW, Wang LH, Chen SM (2008) Temperature prediction and TAIFEX forecasting based on high-order fuzzy logical relationships and genetic simulated annealing techniques. Expert Systems with Applications 34:328–336

    Article  Google Scholar 

  • Lund R, Wang XI, Lu QQ, Reeves J, Gallagher C, Feng Y (2007) Change point Detection in Periodic and Autocorrelated Time Series. J. Climate 20:5178–5190

    Article  Google Scholar 

  • Lutkepohl H (2005) New introduction to multiple time series analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Lyu MR (1996) Handbook of Software Reliability Engineering. IEEE Computer Society Press, McGraw Hill, New York

    Google Scholar 

  • Lyu M, Nikora A (1992) Applying reliability models more effectively. IEEE Software 9(4):43–52

    Article  Google Scholar 

  • Ma L, Tian J (2007) Web Error Classification and Analysis for Reliability Improvement. J Syst Softw 80(6):795–804

  • Martínez Y, Cachero C, Meliá S (2014) Empirical study on the maintainability of Web applications: Model-driven Engineering vs Code-centric. Empirical Software Engineering 19:1887–1920. doi:10.1007/s10664-013-9269-5

    Article  Google Scholar 

  • Maurer C, Peterka JR (2005) A new interpretation of spontaneous sway measures based on a simple model of human postural control. Journal of Neurophysiology 93(1):189–200

    Article  Google Scholar 

  • Moura M, Zio E, Didier Lins I, Droguett E (2011) Failure and reliability prediction by support vector machines regression of time series data. Reliability Engineering and System Safety 96(11):1527–1534

    Article  Google Scholar 

  • Musa JD, Iannino A, Okumoto K (1987) Software Reliability Measurement, Prediction, Application, Int. Ed. McGraw-Hill

  • Offutt J (2002) Quality Attributes of Web Software Applications. IEEE Software 2002(19):25–32

    Article  Google Scholar 

  • Offutt J, Papadimitriou V, Praphamontripong U (2014) A case study on bypass testing of web applications. Empirical Software Engineering 19:69–104. doi:10.1007/s10664-012-9216-x

    Article  Google Scholar 

  • Park JH (2013) Multiple-index approach to multiple autoregressive time series model. Statistics and Computing 23:201–208

    Article  MathSciNet  MATH  Google Scholar 

  • Park JI, Lee DJ, Song CK, Chun MG (2010) TAIFEX and KOSPI 200 forecasting based on two factors high order fuzzy time series and particle swarm optimization. Expert Systems with Application 37:959–967

    Article  Google Scholar 

  • Pearson J, Pearson A, Green D (2007) Determining the Importance of Key Criteria in Web Usability. Management Research News 30(11):816–828

    Article  Google Scholar 

  • Pena D, Sanchez I (2007) Measuring the Advantages of Multivariate vs. Univariate Forecasts. Journal of Time Series Analysis 28:6

    Article  MathSciNet  MATH  Google Scholar 

  • Pham H (1995) Software Reliability and testing. Wiley-IEEE Computer Society Press, ISBN:978-0-8186-6852-4

  • Pham H (2006) System Software Reliability. Springer-Verlag, London

    Book  Google Scholar 

  • Popstojanova KS, Singh AD, Mazimdar S, Li F (2006) Empirical Characterization of Session-Based Workload and Reliability for Web Servers. Empire Software Eng. 11:71–117

    Article  Google Scholar 

  • Qu L, Chen Y, Liu Z (2006, June) Time series forecasting model with error correction by structure adaptive RBF neural network. In Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on, IEEE, vol. 2, pp 6831–6835

  • Robinson D, Dietrich D (1987) A new nonparametric growth model. IEEE Transactions on Reliability 36(4):411–418

    Article  MATH  Google Scholar 

  • Roy A (2016) A novel multivariate fuzzy time series based forecasting algorithm incorporating the effect of clustering on prediction. Soft Computing 20(5):1991–2019

    Article  Google Scholar 

  • Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. International Journal of Climatology 33(2):520–528

    Article  Google Scholar 

  • Schneidewind NF (2012) Computer, Network, Software, and Hardware Engineering with Application. Wiley

  • Shao J (1997) An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7:221–242

    MathSciNet  MATH  Google Scholar 

  • Sharma K, Garg R, Nagpal C, Garg R (2010) Selection of optimal software reliability growth models using a distance based approach. IEEE Transactions on Reliability 59(2):266–276

    Article  Google Scholar 

  • Shumway HR, Stoffer SD (2008) Time series analysis and its applications. Springer

  • Singpurwalla ND (1980) Analyzing availability using transfer function models and cross spectral analysis. Naval Research Logist Quart 27:1–16

    Article  MATH  Google Scholar 

  • Singpurwalla ND, Soyer R (1985) Assessing (Software) Reliability Growth Using a Random Coefficient Autoregressive Process and Its Ramifications. IEEE Trans Softw Eng 11(12):1456–1464

  • Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy time series—part I. Fuzzy Sets Syst 54(1):1–9. doi:10.1016/0165-0114(93)90355-L

    Article  Google Scholar 

  • Song Q, Chissom BS (1994) Forecasting enrollments with fuzzy time series—part II. Fuzzy Sets Syst 62:1–8. doi:10.1016/0165-0114(94)90067-1

    Article  Google Scholar 

  • Sosinsky B (2009) Networking Bible. Wiley

  • Suparta W, Alhasa KM (2013) A comparison of ANFIS and MLP models for the prediction of precipitable water vapor. 2013 I.E. international conference on space science and communication (IconSpace), pp 243–248

  • Tanenbaum AS (2011) Computer Networks. Pearson, India

    MATH  Google Scholar 

  • Tarafdar M, Zhang J (2005) Analyzing the Influence of Website Design Parameters on Website Usability. Information Resources Management Journal 18(4):62–80

    Article  Google Scholar 

  • Tian J (2002) Better Reliability Assessment and Prediction through Data Clustering. IEEE Trans. On Software Engineering 28:997–1007

    Article  Google Scholar 

  • Tian J, Rudraraju S, Li Z (2004) Evaluating Web Software Reliability Based on Workload and Failure Data Extracted from Server Logs. IEEE Trans Softw Eng 30(11):754–769

  • Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology 49(11):1225–1231

    Article  Google Scholar 

  • Walls LA, Bendell A (1987) Time series methods in reliability. Reliab. Eng. & Syst. Safety 18:239–265

    Article  Google Scholar 

  • Werbos PJ (1974) Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge

  • Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clin Res 30(1):79–82

  • Wiper M, Palacios A, Marín J (2012) Bayesian software reliability prediction using software metrics information. Quality Technology and Quantitative Management 9(1):35–44

    Article  Google Scholar 

  • Xie M (1991) Software Reliability Modeling. World Scientific Press, London

    Book  MATH  Google Scholar 

  • Xie M, Ho S (1999) Analysis of repairable system failure data using time series models. Journal of Quality in Maintenance Engineering 5(1):50–61

    Article  Google Scholar 

  • Xie M, Hong G, Wohlin C (1997) A study of the exponential smoothing technique in software reliability growth prediction. Quality and Reliability Engineering International 13(6):347–353

    Article  Google Scholar 

  • Yamada S (2014) Software Reliability Modeling Fundamentals and Applications. Springer, ISBN: 978-4-431-54564-4

  • Yang Y (2003) Can the Strengths of AIC and BIC Be Shared? Biometrika 92(4):937–950

    Article  MATH  Google Scholar 

  • Yang B, Li X, Xie M, Tan F (2010) A generic data-driven software reliability model with model mining technique. Reliability Engineering and System Safety 95(6):671–678

    Article  Google Scholar 

  • Zaidi S, Danial S, Usmani B (2008) Modeling inter-failure time series using neural networks. IEEE International Multitopic Conference, pp 409–411

  • Zou H, Yang Y (2004) Combining time series models for forecasting. Int J Forecast 20(1):69–84

Download references

Acknowledgements

The authors are thankful to the National University of Singapore, Singapore University of Technology and Design (collaborated with the MIT, USA) and The State University of New Jersey, Rutgers for providing excellent environment for completing this work. The constructive comments of the extremely learned associate editor and three enlightened anonymous reviewers are also gratefully acknowledged. The authors would like to express their heartfelt gratitude to Prof. Subhashis Chatterjee, Mr. Rajesh Mishra (Indian Institute of Technology Dhanbad), Prof. Amitava Dutta, Mr. Subhashis Kumar Pal, Mr. Ashish Biswas (Indian Statistical Institute) for providing the necessary data and valuable ideas.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arunava Roy.

Additional information

Communicated by: Hélène Waeselynck

Appendix

Appendix

Table 29 Several workloads and the occurrences of various SCF web errors for www.isical.ac.in (ISI_SCF errors)
Table 30 Several workloads and the occurrences of various SCF web errors for www.ismdhanbad.ac.in (ISM_SCF error)
Table 31 Original and forecasted TBFs for the ISI and ISM data sets using the proposed forecasting framework, where TF and the univariate models show their better forecasting accuracy for ISI (with multiple PCs) and ISM (with single PC) respectively, which are in conformity with the notion of the proposed forecasting framework
Table 32 Predictive accuracy of the TF model is better than the FTS models for the ISI SCF error dataset (Table 29). This is mainly due to the presence of multiple PCs in the data set. For the ISM SCF error data set the univariate models prove their better predictive accuracy, which is in conformity with the proposed forecasting framework. The authors could not present the resulting Table for the ISM SCF error data set (Table 30) due to the shortage of space but are happy to supply if the learned readers need it for their research purpose
Table 33 The neuro-fuzzy forecasting shows their poor predictive accuracy than the proposed framework. The detailed reasons behind this poor accuracy have been listed in Section 8.2. The neuro-fuzzy approaches also show poor predictive accuracy for forecasting the ISM SCF errors, ISI TBFs and the ISM TBFs. Due to the shortage of space, in the present work the authors could not present the resulting tables, however, they will be happy to supply the tables if the readers need them for their future research works

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, A., Pham, H. Toward the development of a conventional time series based web error forecasting framework. Empir Software Eng 23, 570–644 (2018). https://doi.org/10.1007/s10664-017-9530-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9530-4

Keywords

Navigation