Abstract
This paper presents SSRDVis, a visual approach to effectively summarize event sequences and interactively detect rare behaviors. SSRDVis is mainly composed of three components: (1) a sequence embedding module for learning effective feature vectors of sequences, (2) a sequence grouping and summarization module to find representative clusters and patterns in the dataset, (3) a rare detection module to discover and explain the rare cases. The sequences are embedded into vector space via “mixed-ngram2vec,” which is adapted from “word2vec.” Then, unsupervised learning models could be applied to group similar sequences and detect anomalies in the vector space. Furthermore, sequential pattern graphs are built to provide a compact and semantic summarization of sequences. These components work together to present both overall sequential patterns and abnormal behaviors in one visual interface. We have demonstrated the feasibility of our approach by applying it to analyze Web clickstreams. Experimental results have shown that our approach could help identify noticeable patterns from a large number of event sequences, especially for rare behaviors.
Graphic abstract









Similar content being viewed by others
References
Agarwal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 429–435
Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 380–391
Chen Y, Xu P, Ren L (2017) Sequence synopsis: optimize visual summary of temporal event data. IEEE Trans Vis Comput Gr 24(1):45–55
Cuenca E, Sallaberry A, Ying Wang F, Poncelet P (2018) MultiStream: a multiresolution streamgraph approach to explore hierarchical time series. IEEE Trans Vis Comput Gr 24(12):3160–3173
Du F, Shneiderman B, Plaisant C, Malik S, Perer A (2016) Coping with volume and variety in temporal event sequences: strategies for sharpening analytic focus. IEEE Trans Vis Comput Gr 23(6):1636–1649
Fan X, Li C, Dong X (2019) A real-time network security visualization system based on incremental learning (chinavis 2018). J Vis 22(1):215–229
Fournier-Viger P, Wu CW, Tseng VS (2012) Mining top-k association rules. In: Canadian conference on artificial intelligence. Springer, pp 61–73
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–52
Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77
Guo S, Xu K, Zhao R, Gotz D, Zha H, Cao N (2017) EventThread: visual summarization and stage analysis of event sequence data. IEEE Trans Vis Comput Gr 99:1–1
Guo S, Du F, Malik S, Koh E, Kim S, Liu Z, Kim D, Zha H, Cao N (2019) Visualizing uncertainty and alternatives in event sequence predictions. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, p 573
Heckerman D (1999) Msnbc. com anonymous web data set
Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Trans Knowl Discov Data 10(4):45
Kwon BC, Choi M-J, Kim JT, Choi E, Kim YB, Kwon S, Sun J, Choo J (2019) Retainvis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans Vis Comput Gr 25(1):299–309
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Eighth IEEE international conference on data mining. IEEE, pp 413–422
Liu Z, Wang Y, Dontcheva M, Hoffman M, Walker S, Wilson A (2016) Patterns and sequences: interactive exploration of clickstreams to understand common visitor paths. IEEE Trans Vis Comput Gr 23(1):321–330
Liu Z, Kerr B, Dontcheva M, Grover J, Hoffman M, Wilson A (2017) Coreflow: extracting and visualizing branching patterns from event sequences. Comput Gr Forum 36(3):527–538
Lu J, Wang X-F, Adjei O, Hussain F (2004) Sequential patterns graph and its construction algorithm. Chin J Comput Chin Edn 27(6):782–788
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Monroe M, Lan R, Lee H, Plaisant C, Shneiderman B (2013) Temporal event sequence simplification. IEEE Trans Vis Comput Gr 19(12):2227–2236
Ng P (2017) dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279
Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas LJ, Sohl-Dickstein J (2015) Deep knowledge tracing. In: Advances in neural information processing systems, pp 505–513
Plaisant C, Shneiderman B (2016) The diversity of data and tasks in event analytics. In: Proceedings of the IEEE VIS 2016 workshop on temporal and sequential event analysis
Samet A, Guyet T, Négrevergne B (2017) Mining rare sequential patterns with ASP. In: ILP
Scholtes I (2017) When is a network a network? Multi-order graphical model selection in pathways and temporal networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1037–1046
Song Y, Wen Z, Lin CY, Davis R (2013) One-class conditional random fields for sequential anomaly detection. In: Twenty-third international joint conference on artificial intelligence
Sugiyama K, Tagawa S, Toda M (1981) Methods for visual understanding of hierarchical system structures. IEEE Trans Syst Man Cybern 11(2):109–125
Unger A, Dräger N, Sips M, Lehmann DJ (2017) Understanding a sequence of sequences: visual exploration of categorical states in lake sediment cores. IEEE Trans Vis Comput Gr 99:1
Wei J, Shen Z, Sundaresan N, Ma KL (2012) Visual cluster exploration of web clickstream data. In: IEEE VAST, pp 3–12
Wongsuphasawat K, Gotz D (2012) Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Trans Vis Comput Gr 18(12):2659–2668
Wongsuphasawat K, Guerra Gómez JA, Plaisant C, Wang TD, Taieb-Maimon M, Shneiderman, B (2011) Lifeflow: visualizing an overview of event sequences. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1747–1756
Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60
Zhao Z, Liu T, Li S, Li B, Du X (2017) Ngram2vec: learning improved word representations from ngram co-occurrence statistics. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 244–253
Zhu J, Wang K, Wu Y, Hu Z, Wang H (2016) Mining user-aware rare sequential topic patterns in document streams. IEEE Trans Knowl Data Eng 28(7):1790–1804
Acknowledgements
This work is supported by National Key Research and Development Program of China (Grant No. 2017YFB0701900), National Nature Science Foundation of China (Grant No. 61100053) and Key Laboratory of Machine Perception in Peking University (K-2019-09).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, C., Dong, X., Liu, W. et al. SSRDVis: Interactive visualization for event sequences summarization and rare detection. J Vis 23, 171–184 (2020). https://doi.org/10.1007/s12650-019-00609-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-019-00609-x