Skip to main content

Advertisement

Log in

RP-Miner: a relaxed prune algorithm for frequent similar pattern mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Most of the current algorithms for mining frequent patterns assume that two object subdescriptions are similar if they are equal, but in many real-world problems some other ways to evaluate the similarity are used. Recently, three algorithms (ObjectMiner, STreeDC-Miner and STreeNDC-Miner) for mining frequent patterns allowing similarity functions different from the equality have been proposed. For searching frequent patterns, ObjectMiner and STreeDC-Miner use a pruning property called Downward Closure property, which should be held by the similarity function. For similarity functions that do not meet this property, the STreeNDC-Miner algorithm was proposed. However, for searching frequent patterns, this algorithm explores all subsets of features, which could be very expensive. In this work, we propose a frequent similar pattern mining algorithm for similarity functions that do not meet the Downward Closure property, which is faster than STreeNDC-Miner and loses fewer frequent similar patterns than ObjectMiner and STreeDC-Miner. Also we show the quality of the set of frequent similar patterns computed by our algorithm with respect to the quality of the set of frequent similar patterns computed by the other algorithms, in a supervised classification context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216

  2. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM-SIGMOD international conference management of data, pp 94–105

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

  4. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 1995 international conference on data engineering, pp 3–14

  5. Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16: 1–27

    Article  MathSciNet  Google Scholar 

  6. Dánger R, Ruiz-Shulcloper J, Berlanga R (2004) Objectminer: a new approach for mining complex objects. In: Proceedings of the sixth international conference on enterprise information systems, pp 42–47

  7. Gómez J, Rodríguez O, Valladares S, Ruiz-Shulcloper J et al (1994) Prognostic of gas-oil deposits in the Cuban Ophiological Association. Applying mathematical modeling. Geophys Int 33: 447–467

    Google Scholar 

  8. Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15: 55–86

    Article  MathSciNet  Google Scholar 

  9. Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 1999 international conference data on engineering, pp 106–115

  10. Iváncsy R, Vajk I (2006) Frequent pattern mining in web log data. Acta Polytechnica Hungarica. J Appl Sci Bp 1: 77–90

    Google Scholar 

  11. Kelil A, Wang S, Jiang Q, Brzezinski R (2009) A general measure of similarity for categorical sequences. Knowl Inf Syst. doi:10.1007/s10115-009-0237-8

  12. LaRosa C, Xiong L, Mandelberg K (2008) Frequent pattern mining for kernel trace data. In: Proceedings of the 2008 ACM symposium on applied computing, pp 880–885

  13. Li J, Fu AW, Fahey P (2009) Efficient discovery of risk patterns in medical data. Artif Intell Med 45: 77–89

    Article  Google Scholar 

  14. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the 1998 international conference on knowledge discovery and data mining, pp 80–86

  15. Lopez FJ, Blanco A, Garcia F, Cano C, Marin A (2008) Fuzzy association rules for biological data analysis: a case study on yeast. BMC Bioinform 9: 107

    Article  Google Scholar 

  16. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1: 259–289

    Article  Google Scholar 

  17. Martínez-Trinidad JF, Ruiz-Shulcloper J, Lazo-Cortés MS (2000) Structuralization of universes. Fuzzy Sets Syst 112: 485–500

    Article  Google Scholar 

  18. Ortiz-Posadas MR, Vega-Alvarado L, Toni B (2004) A similarity function to evaluate the orthodontic condition in patients with cleft lip and palate. Med Hypotheses 63: 35–41

    Article  Google Scholar 

  19. Ortiz-Posadas MR, Vega-Alvarado L, Toni B (2009) A mathematical function to evaluate surgical complexity of cleft lip and palate. Comput Methods Prog Biomed 94: 232–238

    Article  Google Scholar 

  20. Quan X, Liu G, Lu Z, Ni X, Wenyin L (2009) Short text similarity based on probabilistic topics. Knowl Inf Syst. doi:10.1007/s10115-009-0250-y

  21. Rodríguez-González AY, Martínez-Trinidad JF, Carrasco-Ochoa JA, Ruiz-Shulcloper J (2008) Mining frequent similar patterns on mixed data. In: Ruiz-Shulcloper J, Kropatsch W (ed) Progress in pattern recognition, image analysis and applications, LNCS 5197, Springer, Berlin, pp 136–144

  22. Ruiz-Shulcloper J, Fuentes-Rodrguez A (1981) A cybernetic model to analyze juvenile delinquency. Revista Ciencias Matemáticas 2: 123–153

    Google Scholar 

  23. Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proceedings of the 1998 international conference on very large data bases, pp 594–605

  24. Wan X (2006) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15: 55–73

    Article  Google Scholar 

  25. Yang J, Cheungand WK, Chen X (2009) Learning element similarity matrix for semi-structured document analysis. Knowl Inf Syst 19: 53–78

    Article  Google Scholar 

  26. Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1: 7

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ansel Yoan Rodríguez-González.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodríguez-González, A.Y., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A. et al. RP-Miner: a relaxed prune algorithm for frequent similar pattern mining. Knowl Inf Syst 27, 451–471 (2011). https://doi.org/10.1007/s10115-010-0309-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0309-9

Keywords

Navigation