Skip to main content

Mining Interestingness Measures for String Pattern Mining

  • Conference paper
Trends in Applied Intelligent Systems (IEA/AIE 2010)

Abstract

In this paper we present a novel method to detect interesting patterns in strings. A common way to refine results of pattern mining algorithms is using interestingness measures. But the set of appropiate measures is different in each domain and problem. The aim of our research is to obtain a model that classify patterns by interest. The method is based on the application of machine learning algorithms to a generated dataset from factors features. Each dataset row is associated to a factor of a string and contains values of different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle which improves obtained classification results. The proposed method avoids the experts having to configure parameters in order to obtain interesting patterns. We demonstrated the utility of the method by giving example results on real data. The datasets and scripts to reproduce experiments are available on-line.

This work has been partially supported by the SESAAME project, number TIN2008-06582-C03-03, of the MICINN, Spain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lenca, P., Vaillant, B., Meyer, P., Lallich, S.: Association rule interestingness measures: Experimental and theoretical studies. In: [25], pp. 51–76

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 12-15, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  3. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  4. Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM ’05: Proceedings of the 1st international workshop on open source data mining, pp. 1–5. ACM, New York (2005)

    Chapter  Google Scholar 

  5. Vilo, J.: Discovering frequent patterns from strings. Technical report, Department of Computer Science, University of Helsinki, Finland (1998)

    Google Scholar 

  6. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  7. Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Norwell (2001)

    MATH  Google Scholar 

  8. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)

    Article  Google Scholar 

  9. Huynh, X.H., Guillet, F., Blanchard, J., Kuntz, P., Briand, H., Gras, R.: A graph-based clustering approach to evaluate interestingness measures: A tool and a comparative study. In: [25], pp. 25–50

    Google Scholar 

  10. Geng, L., Hamilton, H.J.: Choosing the right lens: Finding what is interesting in data mining. In: [25], pp. 3–24 (2007)

    Google Scholar 

  11. Jeffreys, H.: Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophical Society 31, 203–222 (1935)

    Google Scholar 

  12. Kodratoff, Y.: Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. Machine Learning and Its Applications: advanced lectures, 1–21 (2001)

    Google Scholar 

  13. Galiano, F.B., Blanco, I.J., Sánchez, D., Miranda, M.A.V.: Measuring the accuracy and interest of association rules: A new framework. Intell. Data Anal. 6(3), 221–235 (2002)

    Google Scholar 

  14. Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, pp. 18–24 (1998)

    Google Scholar 

  15. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM, New York (1993)

    Chapter  Google Scholar 

  16. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 1997, pp. 255–264 (1997)

    Google Scholar 

  17. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec. 26(2), 265–276 (1997)

    Article  Google Scholar 

  18. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)

    Article  Google Scholar 

  19. Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in knowledge discovery and data mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)

    Google Scholar 

  20. Good, I.: The estimation of probabilities, Research monograph. M.I.T. Press, Cambridge (1965)

    Google Scholar 

  21. Az, J., Kodratoff, Y.: A study of the effect of noisy data in rule extraction systems. In: Proceedings of the Sixteenth European Meeting on Cybernetics and Systems Research (EMCSR’02), vol. 2, pp. 781–786 (2002)

    Google Scholar 

  22. Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge (1991)

    Google Scholar 

  23. Yule, U.G.: On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75(6), 579–652 (1912)

    Article  Google Scholar 

  24. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  25. Guillet, F., Hamilton, H.J. (eds.): Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43. Springer, Heidelberg (2007)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baena-García, M., Morales-Bueno, R. (2010). Mining Interestingness Measures for String Pattern Mining. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13022-9_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13021-2

  • Online ISBN: 978-3-642-13022-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics