Abstract
Preferred navigation patterns (PNP) are those contiguous sequential patterns whose elements are preferred by users to be selected as the next steps between several different selections and are preferred by users to spend much time on. Such navigation path and time preferred patterns are more actionable than any other finds only considering either path or time in various web applications, such as web user navigation, targeted online advertising and recommendation. However, due to the conceptual confusion and limitation on navigation preference in the existing work, the corresponding algorithms cannot discover actionable preferred navigation patterns. In this paper, we study the problem of preferred navigation pattern mining by involving both navigation path and time length. Firstly, we carefully define the concepts of time preference and selection preference for time-related path sequences, which can well reflect user interests from the relative path selection and time consumption respectively. Secondly, we propose an efficient PNP-forest algorithm for identifying PNPs, by first introducing PNP-forest data structure, and then presenting PNP-forest growth and maintenance mechanism, associated with optimization strategies. Then we introduce a more efficient mining algorithm called PrefixSpan_Forest, which integrates the advantages of PrefixSpan and PNP-forest. The performance of these two algorithms are also evaluated and the results show that the algorithms can discover PNPs effectively.
Similar content being viewed by others
Notes
The first problem is solved by proposing preference measurements for a whole pattern as given in Definitions 8 and 11; the second problem of the contradiction between the definitions and the algorithms has been addressed by proposing the correct mining algorithm, which is given in Section 5; the third problem is easily solved just by keeping selection preference and time preference separated as we do in Section 4.
It can be evaluated using experiments. For example, for GovLog dataset, when ξ = 0.04 %, δ = 1 and η =1, the total execution time is 247s, while generating frequent candidates with their n C , S T P and n E takes 244.4s.
PNP-forest is also adapted from PNT algorithm, and can be regarded as the correct and optimized version of PNT algorithm. cPNT is actually the version of PNP-forest without optimization.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)
Ahmed, C., Tanbeer, S., Jeong, B., et al.: A framework for mining high utility web access sequences. IETE Techn. Rev. 28(1), 3 (2011)
Arotaritei, D., Mitra, S.: Web mining: a survey in the fuzzy framework. Fuzzy Sets Syst. 148(1), 5–19 (2004)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 429–435. ACM (2002)
Borges, J., Levene, M.: Data mining of user navigation patterns. Web usage analysis and user profiling pp. 92–112 (2000)
Cao, L.: Domain driven data mining: challenges and prospects. IEEE Trans. Knowl. Data Eng. 22, 755–769 (2010)
Cao, L.: Actionable knowledge discovery and delivery. WIREs Data Min. Knowl. Disc. 2(2), 149–163 (2012)
Cao, L., Yu, P., Zhang, C., Zhao, Y.: Domain Driven Data Mining. Springer (2008)
Chen, M., Park, J., Yu, P.: Efficient data mining for path traversal patterns. IEEE Trans. Knowl. Data Eng. 10(2), 209–221 (1998)
Chen, T., Chou, Y., Chen, T.: Mining user movement behavior patterns in a mobile service environment. IEEE Trans. Syst., Man Cybern., Part A: Syst. Hum. 42 (1), 87–101 (2012)
Chen, Y.L., Huang, T.K.: Discovering fuzzy time-interval sequential patterns in sequence databases. IEEE Trans. Syst., Man, Cybern., Part B: Cybern. 35(5), 959–972 (2005)
Chong, C., Ramachandran, V., Eswaran, C.: Path optimization using fuzzy distance approach. In: Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE’99. 1999 IEEE International, vol. 3, pp 1771–1774. IEEE (1999)
Dong, G., Pei, J.: Frequent and closed sequence patterns. In: Sequence Data Mining, pp. 15–46. Springer (2007)
El-Ramly, M., Stroulia, E.: Analysis of web-usage behavior for focused web sites: a case study. J. Softw. Maint. Evol.: Res. Pract. 16(1–2), 129–150 (2004)
Floratou, A., Tata, S., Patel, J.: Efficient and accurate discovery of patterns in sequence datasets. IEEE Trans. Knowl. Data Eng. 23(8), 1154–1168 (2011)
Garofalakis, M., Rastogi, R., Shim, K.: Spirit: Sequential pattern mining with regular expression constraints. In: Proceedings of the International Conference on Very Large Data Bases, pp. 223–234 (1999)
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: Freespan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 355–359. ACM (2000)
Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K.: Online mining abnormal period patterns from multiple medical sensor data streams. World Wide Web 17(4), 569–587 (2014)
Kléma, J., Nováková, L., Karel, F., Stepankova, O., Zelezny, F.: Sequential data mining: A comparative case study in development of atherosclerosis risk factors. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 38(1), 3–15 (2008)
Lee, J., Shi, Y., Wang, F., Lee, H., Kim, H.K.: Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web (2015). doi:10.1007/s11280-015-0353-1
Lee, Y., Yen, S.: Incremental and interactive mining of web traversal patterns. Inf. Sci. 178(2), 287–306 (2008)
Li, H., Lee, S., Shan, M.: Dsm-plw: Single-pass mining of path traversal patterns over streaming web click-sequences. Comput. Netw. 50(10), 1474–1487 (2006)
Liu, C., White, R., Dumais, S.: Understanding web browsing behaviors through weibull analysis of dwell time. In: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp. 379–386. ACM (2010)
Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L., Gay, G.: The influence of task and gender on search and evaluation behavior using google. Inf. Process. Manag. 42(4), 1123–1131 (2006)
Lu, E., Lee, W., Tseng, V.: A framework for personal mobile commerce pattern mining and prediction. IEEE Transactions on Knowledge and Data Engineering (2011). doi:10.1109/TKDE.2011.65
Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Disc. 1(3), 259–289 (1997)
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J. Comput. Biol. 7(3–4), 345–362 (2000)
Naldi, M., D’Acquisto, G., Italiano, G.F.: The value of location in keyword auctions. Electron. Commer. Res. Appl. 9(2), 160–170 (2010)
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004)
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining access patterns efficiently from web logs, pp. 396–407. Knowledge Discovery and Data Mining. Current Issues and New Applications (2000)
Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 18–25. ACM (2002)
Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos, C.: Web usage mining as a tool for personalization: A survey. User Model. User-Adap. Inter. 13(4), 311–372 (2003)
Rao, W., Chen, L., Bartolini, I.: Ranked content advertising in online social networks. World Wide Web 18(3), 661–679 (2015)
Sadeghian, P., Kantardzic, M., Lozitskiy, O., Sheta, W.: The frequent wayfinding-sequence (fws) methodology: Finding preferred routes in complex virtual environments. Int. J. Human-Comput. Stud. 64(4), 356–374 (2006)
Schafer, J., Konstan, J., Riedl, J.: E-commerce recommendation applications. Data Min. Knowl. Disc. 5(1), 115–153 (2001)
Shahabi, C., Zarkesh, A., Adibi, J., Shah, V.: Knowledge discovery from users web-page navigation. In: Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, pp. 20–29. IEEE (1997)
Si, J., Li, Q., Qian, T., Deng, X.: Users interest grouping from online reviews based on topic frequency and order. World Wide Web 17(6), 1321–1342 (2014)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements, pp. 1–17. Advances in Database Technology?? EDBT’96 (1996)
Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. Knowledge and Data Engineering. IEEE Trans. 19(8), 1042–1056 (2007)
Wang, Y., Lee, A.: Mining web navigation patterns with a path traversal graph. Expert Syst. Appl. 38(6), 7112–7122 (2011)
West, R., Leskovec, J.: Human wayfinding in information networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 619–628. ACM (2012)
West, R., Pineau, J., Precup, D.: Wikispeedia: An online game for inferring semantic distances between concepts. In: IJCAI, pp. 1598–1603 (2009)
Xing, D., Shen, J.: Efficient data mining for web navigation patterns. Inf. Softw. Technol. 46(1), 55–63 (2004)
Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)
Yin, J., Zheng, Z., Cao, L.: Uspan: An efficient algorithm for mining high utility sequential patterns. In: KDD 2012, pp. 660–668 (2012)
Yun, C., Chen, M.: Mining mobile sequential patterns in a mobile commerce environment. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 37(2), 278–295 (2007)
Zaki, M.: Spade: An efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)
Zheng, Z., Wei, W., Liu, C., Cao, W., Cao, L., Bhatia, M.: An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web (2015). doi:10.1007/s11280-015-0350-4
Zhou, L., Liu, Y., Wang, J., Shi, Y.: Utility-based web path traversal pattern mining. In: Seventh IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007, pp. 373–380. IEEE (2007)
Acknowledgments
This research was partially supported by Zhejiang Provincial Philosophy and Social Science Foundation of China (No. 15NDJC145YB), National Nature Science Foundation of China (No. 71271191), the National Science & Technology Pillar Program during the 12th Five-year Plan Period of China (No. 2012BAF12B11), Zhejiang Provincial Natural Science Foundation of China (No. LY15F020036), Scientific Research Foundation for the Returned Overseas Chinese Scholars, and Australian Research Council Discovery Grants (DP1096218 and DP130102691) and an ARC Linkage Grant (LP100200774).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Proofs of the Lemmas
Appendix A: Proofs of the Lemmas
1.1 A.1 Proof of the TS-breaking Lemma
Given a TS, i.e., α = <(e 1, t 1), (e 2, t 2), …, (e l , t l ), …, (e n , t n ) >, without loss of generality, suppose e l (l ∈ 1, 2,...,n) is the first event that is not frequent. Then, NP(α) can be divided into three groups: the candidate navigation patterns starting with e i (i ∈ 1, 2,...,l−1), those starting with e l , and others starting with e j (j ∈ l + 1, l + 2...,n).
For the first group, according to Property 1, since e l is infrequent, the candidate navigation patterns containing e l will not be frequent. Thus these candidates can be deleted from the first group, and the remaining are those without containing e l , which is equal to NP(β), β = <(e 1, t 1), (e 2, t 2),…,(e l−1, t l−1)>. For the second group, all candidates should be deleted. For the third group, it is NP(γ), γ = <(e l+1, t l+1), (e l+2, t l+2),…,(e n , t n )>.
In summary, the effect of the insertion of NP(α) into the PNP-forest equals to that of NP(β) and NP(γ), where β and γ are the remaining parts after removing the pair (e l , t l ). Recursively, all pairs containing infrequent events are removed.
Thus we have Lemma 2.
1.2 A.2 Proof of the Suffix-Projection Lemma
Given a F-TS α = <(e 1, t 1), (e 2, t 2), …, (e n , t n ) >, NP(α) can be partitioned off into n groups, any prefixes of e 1| α , e 2| α , and up to e n | α , where e j | α = <(e j , t j ), (e j+1, t j+1), …, (e n , t n ) > and j∈1,2,...,n. Because candidates share a common prefix in the PNP-forest, if we insert e j | α into the PNP-forest, any prefixes of e j | α are also represented. So if we insert S| α into the PNP-forest, all groups of candidates are represented.
1.3 A.3 Proof of the Selection-preference-computation Lemma
In a full-growth PNP-forest, suppose N P p r e means the navigation pattern of node father. father. n C maintains the support of N P p r e , and equals to the sum of supports of different selections for N P p r e and the support that no event follows N P p r e . Since the support that no event follows N P p r e is recorded in father.n E , Lemma 4 can be obtained according to Definition 7.
Rights and permissions
About this article
Cite this article
Shen, B., Cao, L., Yao, M. et al. Mining preferred navigation patterns by consolidating both selection and time preferences. World Wide Web 19, 979–1007 (2016). https://doi.org/10.1007/s11280-015-0371-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-015-0371-z