Abstract
This paper studies the problem of discovering longest streak in multidimensional sequence dataset. Given a multidimensional sequence dataset, the contextual longest streak is the longest consecutive tuples in a context subspace which match with a specific measure constraint. It has various applications in social network analysis, computational journalism, etc. The challenges of the longest streak discovery problem are (i) huge search space, and (ii) non-monotonicity property of streak lengths. In this paper, we propose a novel computation framework with a suite of optimization techniques for it. Our solutions outperform the baseline solution by two orders of magnitude in both real and synthetic datasets. In addition, we validate the effectiveness of our proposal by a real-world case study.
B. Tang—is co-first author.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Aldous, D., Diaconis, P.: Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bull. Am. Math. Soc. 36(4), 413–432 (1999)
Cohen, S., Hamilton, J.T., Turner, F.: Computational journalism. Commun. ACM 54(10), 66–71 (2011)
Cohen, S., Li, C., Yang, J., Yu, C.: Computational journalism: a call to arms to database researchers. In: CIDR, vol. 2011, pp. 148–151 (2011)
Fan, Q., Li, Y., Zhang, D., Tan, K.-L.: Discovering newsworthy themes from sequenced data: a step towards computational journalism. IEEE Trans. Knowl. Data Eng. 29, 1398–1411 (2017)
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM (JACM) 24(4), 664–675 (1977)
Jiang, X., Li, C., Luo, P., Wang, M., Yu, Y.: Prominent streak discovery in sequence data. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1280–1288. ACM (2011)
Li, Y., Zou, L., Zhang, H., Zhao, D.: Computing longest increasing subsequences over sequential data streams. Proc. VLDB Endowment 10(3), 181–192 (2016)
Sultana, A., Hassan, N., Li, C., Yang, J., Yu, C.: Incremental discovery of prominent situational facts. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 112–123. IEEE (2014)
Tang, B., Han, S., Yiu, M.L., Ding, R., Zhang, D.: Extracting top-k insights from multi-dimensional data. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1509–1524. ACM (2017)
Wu, T., Xin, D., Han, J.: Arcube: supporting ranking aggregate queries in partially materialized data cubes. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 79–92. ACM (2008)
Wu, T., Xin, D., Mei, Q., Han, J.: Promotion analysis in multi-dimensional space. Proc. VLDB Endowment 2(1), 109–120 (2009)
Wu, Y., Agarwal, P.K., Li, C., Yang, J., Yu, C.: On one of the few objects. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1487–1495. ACM (2012)
Zhang, G., Jiang, X., Luo, P., Wang, M., Li, C.: Discovering general prominent streaks in sequence data. ACM Trans. Knowl. Discov. Data (TKDD) 8(2), 9 (2014)
Acknowledgement
This work was supported by the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wang, W., Tang, B., Zhu, M. (2018). Efficient Longest Streak Discovery in Multidimensional Sequence Data. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10988. Springer, Cham. https://doi.org/10.1007/978-3-319-96893-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-96893-3_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96892-6
Online ISBN: 978-3-319-96893-3
eBook Packages: Computer ScienceComputer Science (R0)