Abstract
In this era of big data analysis, mining results hold a very important role. So, the data scientists need to be accurate enough with the tools, methods and procedures while performing rule mining. The major issues faced by these scientists are incremental mining and the huge amount of time that is virtually required to finish the mining task. In this context, we propose a new rule mining algorithm which mines the database in a probabilistic approach for finding interesting relations. This paper also compares the new technique with the traditional Apriori, FP Growth and Eclat algorithms. The proposal has also been tested against the various modified approaches of these algorithms. The proposed algorithm finishes the task in O (n) in its best case analysis and in O (n log n) in its worst case analysis. The algorithm also considers less frequent high priority attributes for rule creation, thus makes sure the creation of valid mining rules. The major issue of traditional algorithms was the generation of invalid rules, longer running time and high memory utilizations. This could be remedied by this new proposal. The algorithm was tested against various datasets and the results were evaluated and compared with the traditional algorithm. The results showed a peak performance improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boney, L., Tewfik, A.H., Hamdy, K.N.: Minimum association rule in large database. In: Third International Conference on Computing, pp. 12–16. IEEE Press (2006)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In. VLDB, pp. 487–499 (1994)
Zaki, M., Parthasarathy, S., Ogihara, M., Li. W.: New algorithms for fast discovery of association rules. In. Third International Conference on Knowledge Discovery and Data Mining, vol. 2, pp. 283–296 (1997)
Anandhavalli, M., Gautaman, K.: Association rule mining in genomics. Int. J. Comput. Theor. Eng. 1, 1–13 (2007)
Cooper, C., Zito, M.: Realistic synthetic data for testing association rule mining algorithms for market basket databases. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 398–405. Springer, Heidelberg (2007)
Varde, A.S., Takahashi, M., Rundensteiner, E.A., Ward, M.O., Maniruzzaman, M., Sisson, R.D.: Apriori algorithm and game of life for predictive analysis in materials science. Int. J. Knowl. Based Intell. Eng. Syst. 8, 116–122 (2004)
Wu, H., Lu, Z., Pan, L., Xu, R., Jiang, W.: An improved apriori based algorithm for association rules mining. In. Proceedings of the Sixth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 51–55 (2009)
Bodon, F.: A fast apriori implementation. In: Proceedings of the ICDM Workshop on Frequent Item-set Mining Implementation, vol. 9. IEEE Press (2003)
Kryszkiewicz, M., Rybiński, H.: Data mining in incomplete information systems from rough set perspective. In: Polkowski, L., Tsumoto, S., Lin, T.Y. (eds.) Rough Set Methods and Applications, vol. 56, pp. 567–580. Springer, Heidelberg (2000)
Kosters, W.A., Marchiori, E., Oerlemans, A.A.: Mining clusters with association rules. In: Hand, D.J., Kok, J.N., Berthold, M. (eds.) IDA 1999. LNCS, vol. 1642, pp. 39–50. Springer, Heidelberg (1999)
Lin, T.Y.: Rough set theory in very large databases. In: Symposium on Modeling, Analysis and Simulation, vol. 2, pp. 936–941 (1996)
Borgelt, C.: An implementation of FP growth algorithm. In: Proceedings of the Workshop on Open Source Mining Software. ACM Press (2005)
Malik, K., Raheja, N., Garg, P.: Enhance FP growth algorithm. Int. J. Comput. Eng. Manag. 12, 54–57 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Anand, H.S., Vinod Chandra, S.S. (2016). Probabilistic Mining in Large Transaction Databases. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2016. Lecture Notes in Computer Science(), vol 9714. Springer, Cham. https://doi.org/10.1007/978-3-319-40973-3_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-40973-3_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40972-6
Online ISBN: 978-3-319-40973-3
eBook Packages: Computer ScienceComputer Science (R0)