Skip to main content
Log in

Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules

  • Published:
Journal of Classification Aims and scope Submit manuscript

An Erratum to this article was published on 01 July 2016

Abstract

This paper constitutes Part I of the contribution to the analysis of web visit histories through a new methodological framework. Firstly, web usage and web structure mining are considered as an unique mining process to detect the latent structure of the web navigation across the web sections of a single portal. We extend association rules theory to web data defining new concepts of web (patterns) association and preference matrices, as well as of (indirect and direct) sequence rules. We identify the most significant rules, according to a multiple testing procedure. In the literature, web usage patterns can be visualized in no-distance-based graphs describing the navigation behavior across web pages with sequential arrows. In the following, we introduce a geometrical visualization of sequence rules at any click of the web navigation. In particular, we provide two distance-based visualization methods for the static analysis of all data tout court and the dynamic analysis to discover the most significant web paths click by click. A real world case study is considered throughout the methodological description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ABDI, H. (2007a), “Bonferroni and Šidák Corrections for Multiple Comparisons”, in Encyclopedia of Measurement and Statistics, ed. N.J. Salkind, Thousand Oaks, CA: Sage, pp. 104–108.

  • ABDI, H. (2007b), “RV Coefficient and Congruence Coefficient”, in Encyclopedia of Measurement and Statistics, ed. N.J. Salkind, Thousand Oaks, CA: Sage, pp. 850–856.

  • AL-SAFADI, L.A.E. (2010), “A Dual-Mode Intelligent Shopping Assistant”, Advances in Information Sciences and Service Sciences, 2(4), 43–54.

  • AGRAWAL, R., and SRIKANT R. (1994), “Fast Algorithms for Mining Association Rules”, in Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, pp. 487–499.

  • BERRY, M.J.A., and LINOFF, G.S. (2002), Mining the Web: Transforming Customer Data, New York: John Wiley and Sons.

    Google Scholar 

  • BLANC, E., and GIUDICI, P. (2002), “Sequence Rules for Web Clickstream Analysis”, Advances in Data Mining, Lecture Notes in Computer Science, 2394/-1, 1–14.

  • BORG, I., and GROENEN, P.J.F. (2005), Modern Multidimensional Scaling, New York: Springer-Verlag.

    MATH  Google Scholar 

  • BORG, I., GROENEN, P.J.F., and MAIR, P. (2013), Applied Multidimensional Scaling, Heidelberg: Springer.

    Book  Google Scholar 

  • CHAKRABARTI, S. (2002), Mining the Web, San Francisco CA: Morgan Kaufmann.

    Google Scholar 

  • COMMANDEUR, J.J.F., and HEISER,W.J. (1993), “Mathematical Derivations in the Proximity Scaling (PROXSCAL) of Symmetric Data Matrices”, Technical Report No. RR-93-03, Leiden University, The Netherlands, Department of Data Theory.

  • COX, A., and COX, T.F. (2001), Multidimensional Scaling, London: Chapman and Hall.

    MATH  Google Scholar 

  • COOLEY, R., MOBASHER, B., and SRIVASTAVA, J. (1999), “Data Preparation for Mining World Wide Web Browsing Patterns”, Knowledge and Information Systems, 1, 5–32.

    Article  Google Scholar 

  • D’AMBROSIO, A., and PECORARO, M. (2011), “Multidimensional Scaling as Visualization Tool of Web Sequence Rules”, in Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization, eds. B. Fichet et al., Berlin, Heidelbert: Springer-Verlag, pp. 307-314.

  • D’AMBROSIO, A., PECORARO, M., SICILIANO, R. (2008) “Web Preferences Visualization through Multidimensional Scaling and Trees”, DATAVIZ VI International Conference on Statistical Graphics: Data and Information Visualization in Today’s Multimedia Society, Bremen, Jacobs University, June 25-28, 2008 (Organizers: Lars Linsen and Adi Wilhelm).

  • DE LEEUW, J. (1977), “Application of Convex Analysis to Multidimensional Scaling”, in Recent Developments in Statistics, eds. J.R. Barra, F. Brodeau,G. Romier, and B. van Cutsem, Amsterdam: North Holland Publishing, pp. 133–145.

  • DUNN, O.J. (1961), “Multiple Comparisons Among Means”, Journal of the American Statistical Association, 56, 52–64.

    Article  MathSciNet  MATH  Google Scholar 

  • ETZIONI, O. (1996), “The World Wide Web: Quagmire or Gold Mine”, in Communications of the ACM, 39(11), 65–68.

    Article  Google Scholar 

  • FREUND, Y., and SCHAPIRE, R.E. (1997), “A Decision-Theoretic Generalization of Online Learning and an Application to Boosting”, Journal of Computer and System Sciences, 55(1), 119–139.

    Article  MathSciNet  MATH  Google Scholar 

  • GIUDICI, P., and FIGINI, S. (2009), Applied Data Mining for Business and Industry, New York: Wiley.

    Book  MATH  Google Scholar 

  • HÄMÄLÄINEN, W. (2010), “StatApriori: An Efficient Algorithm for Searching Statistically Significant Association Rules” Knowledge and Information Systems, 23(3), 373–399.

    Article  Google Scholar 

  • HASTIE T., TIBSHIRANI R., and FRIEDMAN J. (2009), The Elements of Statistical Learning (2nd ed.), Springer-Verlag.

  • HEISER, W.J. (1988), “PROXSCAL, Multidimensional Scaling of Proximities”, in International Meeting on the Analysis of Multiway Data Matrices, Software Guide, eds. A. Di Ciaccio and G. Bove, Rome: C.N.R., pp. 77–81.

  • HURJUI, C., GRAUR, A., and TURCU, C.O. (2008), “Monitoring the Shopping Activities from the Supermarkets Based on the Intelligent Basket by Using RFID Technology”, Electronics and Electrical Engineering, 3(83), 7–10.

    Google Scholar 

  • LAURO, N.C., and SICILIANO, R. (1989), “Exploratory Methods and Modelling for Contingency Tables: An Integrated Approach”, Statistica Applicata: Italian Journal of Applied Statistics, 1, 5–32.

    Google Scholar 

  • LAURO, N.C., and SICILIANO, R. (2000), “Analyse non symmetrique des correspondances pour des tables de contingences”, in L’Analyse des Correspondances et les techniques connexes, partie III, eds. J. Moreau, P.A. Doudin, and P. Cazes, Berlin, Heidelberg: Springer Verlag, pp. 183–210.

  • KOSALA, R., and BLOCKEEL, H. (2000), “Web Mining Research: A Survey”, ACM SIGKDD Explorations, 2, 1–15.

    Article  Google Scholar 

  • PECORARO, M., and SICILIANO, R. (2008), “Statistical Methods for User Profiling in Web Usage Mining”, in Handbook of Research on Text and Web Mining Technologies, eds. M. Song, and Y.B. Wu, Hershey PA: Idea Group Inc., pp. 359–368.

  • SHAFFER, J. (1995), “Multiple Hypothesis Testing”, Annual Review of Psychology, 46, 561–584.

    Article  Google Scholar 

  • SICILIANO, R., MOOIJAART, A., and VAN DER HEIJDEN, P.G.M. (1993),“A Probabilistic Model for Nonsymmetric Correspondence Analysis and Prediction in Contingency Tables”, Journal of Italian Statistical Society, 2(1), 85–106.

    Article  MATH  Google Scholar 

  • SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., and TANS, P.-N. (2000), “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data”, SIGKDD Explorations, 1, 12–23.

    Article  Google Scholar 

  • TOBLER, W., and WINEBURG, S. (1971), “A Cappadocian Speculation”, Nature, 231, 39–41.

    Article  Google Scholar 

  • ZHANG, C., and ZHANG, S. (2002), Association Rule Mining: Models and Algorithms, Heidelberg: Springer-Verlag.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberta Siciliano.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s00357-016-9210-x.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Siciliano, R., D’Ambrosio, A., Aria, M. et al. Analysis of Web Visit Histories, Part I: Distance-Based Visualization of Sequence Rules. J Classif 33, 298–324 (2016). https://doi.org/10.1007/s00357-016-9204-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-016-9204-8

Keywords

Navigation