Skip to main content
Log in

Mining preferred navigation patterns by consolidating both selection and time preferences

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Preferred navigation patterns (PNP) are those contiguous sequential patterns whose elements are preferred by users to be selected as the next steps between several different selections and are preferred by users to spend much time on. Such navigation path and time preferred patterns are more actionable than any other finds only considering either path or time in various web applications, such as web user navigation, targeted online advertising and recommendation. However, due to the conceptual confusion and limitation on navigation preference in the existing work, the corresponding algorithms cannot discover actionable preferred navigation patterns. In this paper, we study the problem of preferred navigation pattern mining by involving both navigation path and time length. Firstly, we carefully define the concepts of time preference and selection preference for time-related path sequences, which can well reflect user interests from the relative path selection and time consumption respectively. Secondly, we propose an efficient PNP-forest algorithm for identifying PNPs, by first introducing PNP-forest data structure, and then presenting PNP-forest growth and maintenance mechanism, associated with optimization strategies. Then we introduce a more efficient mining algorithm called PrefixSpan_Forest, which integrates the advantages of PrefixSpan and PNP-forest. The performance of these two algorithms are also evaluated and the results show that the algorithms can discover PNPs effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15

Similar content being viewed by others

Notes

  1. The first problem is solved by proposing preference measurements for a whole pattern as given in Definitions 8 and 11; the second problem of the contradiction between the definitions and the algorithms has been addressed by proposing the correct mining algorithm, which is given in Section 5; the third problem is easily solved just by keeping selection preference and time preference separated as we do in Section 4.

  2. It can be evaluated using experiments. For example, for GovLog dataset, when ξ = 0.04 %, δ = 1 and η =1, the total execution time is 247s, while generating frequent candidates with their n C , S T P and n E takes 244.4s.

  3. UAM and PNT algorithms proposed by Xing and Shen [43] cannot discover preferred navigation patterns appropriately and correctly as mentioned in Section 1.

  4. PNP-forest is also adapted from PNT algorithm, and can be regarded as the correct and optimized version of PNT algorithm. cPNT is actually the version of PNP-forest without optimization.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)

  2. Ahmed, C., Tanbeer, S., Jeong, B., et al.: A framework for mining high utility web access sequences. IETE Techn. Rev. 28(1), 3 (2011)

    Article  Google Scholar 

  3. Arotaritei, D., Mitra, S.: Web mining: a survey in the fuzzy framework. Fuzzy Sets Syst. 148(1), 5–19 (2004)

    Article  MathSciNet  Google Scholar 

  4. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 429–435. ACM (2002)

  5. Borges, J., Levene, M.: Data mining of user navigation patterns. Web usage analysis and user profiling pp. 92–112 (2000)

  6. Cao, L.: Domain driven data mining: challenges and prospects. IEEE Trans. Knowl. Data Eng. 22, 755–769 (2010)

    Article  Google Scholar 

  7. Cao, L.: Actionable knowledge discovery and delivery. WIREs Data Min. Knowl. Disc. 2(2), 149–163 (2012)

    Article  Google Scholar 

  8. Cao, L., Yu, P., Zhang, C., Zhao, Y.: Domain Driven Data Mining. Springer (2008)

  9. Chen, M., Park, J., Yu, P.: Efficient data mining for path traversal patterns. IEEE Trans. Knowl. Data Eng. 10(2), 209–221 (1998)

    Article  Google Scholar 

  10. Chen, T., Chou, Y., Chen, T.: Mining user movement behavior patterns in a mobile service environment. IEEE Trans. Syst., Man Cybern., Part A: Syst. Hum. 42 (1), 87–101 (2012)

    Article  Google Scholar 

  11. Chen, Y.L., Huang, T.K.: Discovering fuzzy time-interval sequential patterns in sequence databases. IEEE Trans. Syst., Man, Cybern., Part B: Cybern. 35(5), 959–972 (2005)

    Article  Google Scholar 

  12. Chong, C., Ramachandran, V., Eswaran, C.: Path optimization using fuzzy distance approach. In: Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International, vol. 3, pp 1771–1774. IEEE (1999)

  13. Dong, G., Pei, J.: Frequent and closed sequence patterns. In: Sequence Data Mining, pp. 15–46. Springer (2007)

  14. El-Ramly, M., Stroulia, E.: Analysis of web-usage behavior for focused web sites: a case study. J. Softw. Maint. Evol.: Res. Pract. 16(1–2), 129–150 (2004)

    Article  Google Scholar 

  15. Floratou, A., Tata, S., Patel, J.: Efficient and accurate discovery of patterns in sequence datasets. IEEE Trans. Knowl. Data Eng. 23(8), 1154–1168 (2011)

    Article  Google Scholar 

  16. Garofalakis, M., Rastogi, R., Shim, K.: Spirit: Sequential pattern mining with regular expression constraints. In: Proceedings of the International Conference on Very Large Data Bases, pp. 223–234 (1999)

  17. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–359. ACM (2000)

  18. Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K.: Online mining abnormal period patterns from multiple medical sensor data streams. World Wide Web 17(4), 569–587 (2014)

    Article  Google Scholar 

  19. Kléma, J., Nováková, L., Karel, F., Stepankova, O., Zelezny, F.: Sequential data mining: A comparative case study in development of atherosclerosis risk factors. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 38(1), 3–15 (2008)

    Article  Google Scholar 

  20. Lee, J., Shi, Y., Wang, F., Lee, H., Kim, H.K.: Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web (2015). doi:10.1007/s11280-015-0353-1

  21. Lee, Y., Yen, S.: Incremental and interactive mining of web traversal patterns. Inf. Sci. 178(2), 287–306 (2008)

    Article  Google Scholar 

  22. Li, H., Lee, S., Shan, M.: Dsm-plw: Single-pass mining of path traversal patterns over streaming web click-sequences. Comput. Netw. 50(10), 1474–1487 (2006)

    Article  Google Scholar 

  23. Liu, C., White, R., Dumais, S.: Understanding web browsing behaviors through weibull analysis of dwell time. In: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 379–386. ACM (2010)

  24. Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L., Gay, G.: The influence of task and gender on search and evaluation behavior using google. Inf. Process. Manag. 42(4), 1123–1131 (2006)

    Article  Google Scholar 

  25. Lu, E., Lee, W., Tseng, V.: A framework for personal mobile commerce pattern mining and prediction. IEEE Transactions on Knowledge and Data Engineering (2011). doi:10.1109/TKDE.2011.65

  26. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997)

    Article  Google Scholar 

  27. Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7(3–4), 345–362 (2000)

    Article  Google Scholar 

  28. Naldi, M., D’Acquisto, G., Italiano, G.F.: The value of location in keyword auctions. Electron. Commer. Res. Appl. 9(2), 160–170 (2010)

  29. Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)

    Article  Google Scholar 

  30. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs, pp. 396–407. Knowledge Discovery and Data Mining. Current Issues and New Applications (2000)

  31. Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 18–25. ACM (2002)

  32. Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos, C.: Web usage mining as a tool for personalization: A survey. User Model. User-Adap. Inter. 13(4), 311–372 (2003)

    Article  Google Scholar 

  33. Rao, W., Chen, L., Bartolini, I.: Ranked content advertising in online social networks. World Wide Web 18(3), 661–679 (2015)

    Article  Google Scholar 

  34. Sadeghian, P., Kantardzic, M., Lozitskiy, O., Sheta, W.: The frequent wayfinding-sequence (fws) methodology: Finding preferred routes in complex virtual environments. Int. J. Human-Comput. Stud. 64(4), 356–374 (2006)

    Article  Google Scholar 

  35. Schafer, J., Konstan, J., Riedl, J.: E-commerce recommendation applications. Data Min. Knowl. Disc. 5(1), 115–153 (2001)

    Article  MATH  Google Scholar 

  36. Shahabi, C., Zarkesh, A., Adibi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, pp. 20–29. IEEE (1997)

  37. Si, J., Li, Q., Qian, T., Deng, X.: Users interest grouping from online reviews based on topic frequency and order. World Wide Web 17(6), 1321–1342 (2014)

    Article  Google Scholar 

  38. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements, pp. 1–17. Advances in Database Technology?? EDBT’96 (1996)

  39. Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. Knowledge and Data Engineering. IEEE Trans. 19(8), 1042–1056 (2007)

    MathSciNet  Google Scholar 

  40. Wang, Y., Lee, A.: Mining web navigation patterns with a path traversal graph. Expert Syst. Appl. 38(6), 7112–7122 (2011)

    Article  MathSciNet  Google Scholar 

  41. West, R., Leskovec, J.: Human wayfinding in information networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 619–628. ACM (2012)

  42. West, R., Pineau, J., Precup, D.: Wikispeedia: An online game for inferring semantic distances between concepts. In: IJCAI, pp. 1598–1603 (2009)

  43. Xing, D., Shen, J.: Efficient data mining for web navigation patterns. Inf. Softw. Technol. 46(1), 55–63 (2004)

    Article  Google Scholar 

  44. Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)

  45. Yin, J., Zheng, Z., Cao, L.: Uspan: An efficient algorithm for mining high utility sequential patterns. In: KDD 2012, pp. 660–668 (2012)

  46. Yun, C., Chen, M.: Mining mobile sequential patterns in a mobile commerce environment. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 37(2), 278–295 (2007)

    Article  Google Scholar 

  47. Zaki, M.: Spade: An efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)

    Article  MATH  Google Scholar 

  48. Zheng, Z., Wei, W., Liu, C., Cao, W., Cao, L., Bhatia, M.: An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web (2015). doi:10.1007/s11280-015-0350-4

  49. Zhou, L., Liu, Y., Wang, J., Shi, Y.: Utility-based web path traversal pattern mining. In: Seventh IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007, pp. 373–380. IEEE (2007)

Download references

Acknowledgments

This research was partially supported by Zhejiang Provincial Philosophy and Social Science Foundation of China (No. 15NDJC145YB), National Nature Science Foundation of China (No. 71271191), the National Science & Technology Pillar Program during the 12th Five-year Plan Period of China (No. 2012BAF12B11), Zhejiang Provincial Natural Science Foundation of China (No. LY15F020036), Scientific Research Foundation for the Returned Overseas Chinese Scholars, and Australian Research Council Discovery Grants (DP1096218 and DP130102691) and an ARC Linkage Grant (LP100200774).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Shen.

Appendix A: Proofs of the Lemmas

Appendix A: Proofs of the Lemmas

1.1 A.1 Proof of the TS-breaking Lemma

Given a TS, i.e., α = <(e 1, t 1), (e 2, t 2), …, (e l , t l ), …, (e n , t n ) >, without loss of generality, suppose e l (l ∈ 1, 2,...,n) is the first event that is not frequent. Then, NP(α) can be divided into three groups: the candidate navigation patterns starting with e i (i ∈ 1, 2,...,l−1), those starting with e l , and others starting with e j (jl + 1, l + 2...,n).

For the first group, according to Property 1, since e l is infrequent, the candidate navigation patterns containing e l will not be frequent. Thus these candidates can be deleted from the first group, and the remaining are those without containing e l , which is equal to NP(β), β = <(e 1, t 1), (e 2, t 2),…,(e l−1, t l−1)>. For the second group, all candidates should be deleted. For the third group, it is NP(γ), γ = <(e l+1, t l+1), (e l+2, t l+2),…,(e n , t n )>.

In summary, the effect of the insertion of NP(α) into the PNP-forest equals to that of NP(β) and NP(γ), where β and γ are the remaining parts after removing the pair (e l , t l ). Recursively, all pairs containing infrequent events are removed.

Thus we have Lemma 2.

1.2 A.2 Proof of the Suffix-Projection Lemma

Given a F-TS α = <(e 1, t 1), (e 2, t 2), …, (e n , t n ) >, NP(α) can be partitioned off into n groups, any prefixes of e 1| α , e 2| α , and up to e n | α , where e j | α = <(e j , t j ), (e j+1, t j+1), …, (e n , t n ) > and j∈1,2,...,n. Because candidates share a common prefix in the PNP-forest, if we insert e j | α into the PNP-forest, any prefixes of e j | α are also represented. So if we insert S| α into the PNP-forest, all groups of candidates are represented.

1.3 A.3 Proof of the Selection-preference-computation Lemma

In a full-growth PNP-forest, suppose N P p r e means the navigation pattern of node father. father. n C maintains the support of N P p r e , and equals to the sum of supports of different selections for N P p r e and the support that no event follows N P p r e . Since the support that no event follows N P p r e is recorded in father.n E , Lemma 4 can be obtained according to Definition 7.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, B., Cao, L., Yao, M. et al. Mining preferred navigation patterns by consolidating both selection and time preferences. World Wide Web 19, 979–1007 (2016). https://doi.org/10.1007/s11280-015-0371-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0371-z

Keywords

Navigation