Abstract
In this paper we present a novel method to detect interesting patterns in strings. A common way to refine results of pattern mining algorithms is using interestingness measures. But the set of appropiate measures is different in each domain and problem. The aim of our research is to obtain a model that classify patterns by interest. The method is based on the application of machine learning algorithms to a generated dataset from factors features. Each dataset row is associated to a factor of a string and contains values of different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle which improves obtained classification results. The proposed method avoids the experts having to configure parameters in order to obtain interesting patterns. We demonstrated the utility of the method by giving example results on real data. The datasets and scripts to reproduce experiments are available on-line.
This work has been partially supported by the SESAAME project, number TIN2008-06582-C03-03, of the MICINN, Spain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lenca, P., Vaillant, B., Meyer, P., Lallich, S.: Association rule interestingness measures: Experimental and theoretical studies. In: [25], pp. 51–76
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 12-15, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM ’05: Proceedings of the 1st international workshop on open source data mining, pp. 1–5. ACM, New York (2005)
Vilo, J.: Discovering frequent patterns from strings. Technical report, Department of Computer Science, University of Helsinki, Finland (1998)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Norwell (2001)
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)
Huynh, X.H., Guillet, F., Blanchard, J., Kuntz, P., Briand, H., Gras, R.: A graph-based clustering approach to evaluate interestingness measures: A tool and a comparative study. In: [25], pp. 25–50
Geng, L., Hamilton, H.J.: Choosing the right lens: Finding what is interesting in data mining. In: [25], pp. 3–24 (2007)
Jeffreys, H.: Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophical Society 31, 203–222 (1935)
Kodratoff, Y.: Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. Machine Learning and Its Applications: advanced lectures, 1–21 (2001)
Galiano, F.B., Blanco, I.J., Sánchez, D., Miranda, M.A.V.: Measuring the accuracy and interest of association rules: A new framework. Intell. Data Anal. 6(3), 221–235 (2002)
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, pp. 18–24 (1998)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM, New York (1993)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 1997, pp. 255–264 (1997)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec. 26(2), 265–276 (1997)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in knowledge discovery and data mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)
Good, I.: The estimation of probabilities, Research monograph. M.I.T. Press, Cambridge (1965)
Az, J., Kodratoff, Y.: A study of the effect of noisy data in rule extraction systems. In: Proceedings of the Sixteenth European Meeting on Cybernetics and Systems Research (EMCSR’02), vol. 2, pp. 781–786 (2002)
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge (1991)
Yule, U.G.: On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75(6), 579–652 (1912)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Guillet, F., Hamilton, H.J. (eds.): Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baena-García, M., Morales-Bueno, R. (2010). Mining Interestingness Measures for String Pattern Mining. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-13022-9_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)