Mining Interestingness Measures for String Pattern Mining

Baena-García, Manuel; Morales-Bueno, Rafael

doi:10.1007/978-3-642-13022-9_56

Manuel Baena-García²⁴ &
Rafael Morales-Bueno²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6096))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2186 Accesses

Abstract

In this paper we present a novel method to detect interesting patterns in strings. A common way to refine results of pattern mining algorithms is using interestingness measures. But the set of appropiate measures is different in each domain and problem. The aim of our research is to obtain a model that classify patterns by interest. The method is based on the application of machine learning algorithms to a generated dataset from factors features. Each dataset row is associated to a factor of a string and contains values of different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle which improves obtained classification results. The proposed method avoids the experts having to configure parameters in order to obtain interesting patterns. We demonstrated the utility of the method by giving example results on real data. The datasets and scripts to reproduce experiments are available on-line.

This work has been partially supported by the SESAAME project, number TIN2008-06582-C03-03, of the MICINN, Spain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Two Decades of Pattern Mining: Principles and Methods

Pattern mining: current status and emerging topics

Article 08 March 2016

Pattern Mining: Current Challenges and Opportunities

References

Lenca, P., Vaillant, B., Meyer, P., Lallich, S.: Association rule interestingness measures: Experimental and theoretical studies. In: [25], pp. 51–76
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 12-15, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Borgelt, C.: An implementation of the fp-growth algorithm. In: OSDM ’05: Proceedings of the 1st international workshop on open source data mining, pp. 1–5. ACM, New York (2005)
Chapter Google Scholar
Vilo, J.: Discovering frequent patterns from strings. Technical report, Department of Computer Science, University of Helsinki, Finland (1998)
Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
Article MathSciNet Google Scholar
Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Norwell (2001)
MATH Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right objective measure for association analysis. Information Systems 29(4), 293–313 (2004)
Article Google Scholar
Huynh, X.H., Guillet, F., Blanchard, J., Kuntz, P., Briand, H., Gras, R.: A graph-based clustering approach to evaluate interestingness measures: A tool and a comparative study. In: [25], pp. 25–50
Google Scholar
Geng, L., Hamilton, H.J.: Choosing the right lens: Finding what is interesting in data mining. In: [25], pp. 3–24 (2007)
Google Scholar
Jeffreys, H.: Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophical Society 31, 203–222 (1935)
Google Scholar
Kodratoff, Y.: Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. Machine Learning and Its Applications: advanced lectures, 1–21 (2001)
Google Scholar
Galiano, F.B., Blanco, I.J., Sánchez, D., Miranda, M.A.V.: Measuring the accuracy and interest of association rules: A new framework. Intell. Data Anal. 6(3), 221–235 (2002)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: PODS 98, Symposium on Principles of Database Systems, Seattle, WA, USA, pp. 18–24 (1998)
Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD ’93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207–216. ACM, New York (1993)
Chapter Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, May 1997, pp. 255–264 (1997)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. SIGMOD Rec. 26(2), 265–276 (1997)
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in knowledge discovery and data mining, pp. 249–271. American Association for Artificial Intelligence, Menlo Park (1996)
Google Scholar
Good, I.: The estimation of probabilities, Research monograph. M.I.T. Press, Cambridge (1965)
Google Scholar
Az, J., Kodratoff, Y.: A study of the effect of noisy data in rule extraction systems. In: Proceedings of the Sixteenth European Meeting on Cybernetics and Systems Research (EMCSR’02), vol. 2, pp. 781–786 (2002)
Google Scholar
Piatetsky-Shapiro, G.: Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge (1991)
Google Scholar
Yule, U.G.: On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75(6), 579–652 (1912)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Guillet, F., Hamilton, H.J. (eds.): Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43. Springer, Heidelberg (2007)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071, Málaga, Spain
Manuel Baena-García & Rafael Morales-Bueno

Authors

Manuel Baena-García
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Morales-Bueno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computing and Numerical Analysis, University of Cordoba, Campus Universitario de Rabanales, Einstein Building, 3rd floor, 14071, Cordoba, Spain
Nicolás García-Pedrajas
Dept. of Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
Francisco Herrera
School of Computing, University of the West of Scotland, PA1 2BE, Paisley, UK
Colin Fyfe
Dept. Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
José Manuel Benítez
Department of Computer Science, Texas State University-San Marcos, 601 University Drive, TX 78666-4616, San Marcos, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baena-García, M., Morales-Bueno, R. (2010). Mining Interestingness Measures for String Pattern Mining. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_56

Download citation

DOI: https://doi.org/10.1007/978-3-642-13022-9_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mining Interestingness Measures for String Pattern Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Two Decades of Pattern Mining: Principles and Methods

Pattern mining: current status and emerging topics

Pattern Mining: Current Challenges and Opportunities

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Mining Interestingness Measures for String Pattern Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Two Decades of Pattern Mining: Principles and Methods

Pattern mining: current status and emerging topics

Pattern Mining: Current Challenges and Opportunities

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation