Abstract
We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form “whenever a series of precedent events occurs, eventually a series of consequent events occurs”. Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the “window” barrier. Bridging the gap between mined rules and program specifications, we formalize our rules in linear temporal logic. We introduce and apply a novel notion of rule redundancy to ensure efficient mining of a compact representative set of rules. Performance studies on benchmark datasets and a case study on an industrial system have been performed to show the scalability and utility of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB (1994)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE (1995)
Ammons, G., Bodik, R., Larus, J.R.: Mining specification. In: SIGPLAN-SIGACT POPL (2002)
Barth, A., Datta, A., Mitchell, J.C., Nissenbaum, H.: Privacy and contextual integrity: Framework and applications. In: S&P (2006)
Capilla, R., Duenas, J.C.: Light-weight product-lines for evolution and maintenance of web sites. In: CSMR (2003)
Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (1999)
Corbett, J., Dwyer, M., Hatcliff, J., Pasareanu, C., Robby,, Laubach, S., Zheng, H.-J.: Bandera: extracting finite-state models from java source code. In: ICSE (2000)
Deelstra, S., Sinnema, M., Bosch, J.: Experiences in software product families: Problems and issues during product derivation. In: Nord, R.L. (ed.) SPLC 2004. LNCS, vol. 3154, Springer, Heidelberg (2004)
Dwyer, M., Avrunin, G., Corbett, J.: Patterns in property specifications for finite-state verification. In: ICSE (1999)
Engler, D.R., Chen, D.Y., Chou, A.: Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In: SOSP (2001)
Garriga, G.C.: Discovering unbounded episodes in sequential data. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, Springer, Heidelberg (2003)
Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Thoery, Language, and Computation. Addison-Wesley, Reading (2001)
Huth, M., Ryan, M.: Logic in Computer Science. Cambridge (2004)
ITU-T. ITU-T Recommendation Z.120: Message Sequence Chart (MSC) (1999)
Kohavi, R., Brodley, C., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 organizers report: Peeling the onion. SIGKDD Explorations 2, 86–98 (2000)
Liu, C., Lian, Z., Han, J.: How bayesians debug. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, Springer, Heidelberg (2006)
Lo, D.: A sound and complete specification miner. In: SIGPLAN PLDI Student Research Competition (awarded 2nd position) (2007), http://www.acm.org/src/winners.html
Lo, D., Khoo, S.-C.: SMArTIC: Toward building an accurate, robust and scalable specification miner. In: SIGSOFT FSE (2006)
Lo, D., Khoo, S.-C., Liu, C.: Efficient mining of iterative patterns for software specification discovery. In: SIGKDD (2007)
Lo, D., Khoo, S.-C., Liu, C.: Mining recurrent rules from sequence database. In SoC-NUS Technical Report TR12/07 (2007)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. DMKD 1, 259–289 (1997)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE (2001)
Spiliopoulou, M.: Managing interesting rules in sequence mining. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, Springer, Heidelberg (1999)
Wang, J., Han, J.: BIDE: Efficient mining of frequent closed sequences. In: ICDE (2004)
Wing, J.M.: A specifier’s introduction to formal methods. IEEE Computer 23, 8–24 (1990)
Weimer, W., Necula, G.: Mining temporal specifications for error detection. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, Springer, Heidelberg (2005)
Yan, X., Han, J., Afhar, R.: CloSpan: Mining closed sequential patterns in large datasets. In: SDM (2003)
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., Das, M.: Perracotta: Mining temporal API rules from imperfect traces. In: ICSE (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lo, D., Khoo, SC., Liu, C. (2008). Efficient Mining of Recurrent Rules from a Sequence Database. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-78568-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)