Abstract
Associative classification algorithms have been successfully used to construct classification systems. The major strength of such techniques is that they are able to use the most accurate rules among an exhaustive list of class-association rules. This explains their good performance in general, but to the detriment of an expensive computing cost, inherited from association rules discovery algorithms. We address this issue by proposing a distributed methodology based on FP-growth algorithm. In a shared nothing architecture, subsets of classification rules are generated in parallel from several data partitions. An inter-processor communication is established in order to make global decisions. This exchange is made only in the first level of recursion, allowing each machine to subsequently process all its assigned tasks independently. The final classifier is built by a majority vote. This approach is illustrated by a detailed example, and an analysis of communication cost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Sharfer, J.: Parallel Mining of Association Rules. IEEE Transaction on Knowledge and Data Engineering 8(6), 962–969 (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rule. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann, Santiago (1994)
Alipio, M.J., Paulo, J.A.: An experiment with association rules and classification: Post-bagging and conviction. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 137–149. Springer, Heidelberg (2005)
Buehrer, G., Parthasarathy, S., Tatikonda, S., Kurc, T., Saltz, J.: Toward terabyte pattern mining: An architecture-conscious solution. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 2–12 (2007)
Chawla, N., Eschrich, S., Hall, L.O.: Creating Ensembles of Classifiers. In: IEEE International Conference on Data Mining, pp. 580–581 (2001)
Chen, D., Lai, C., Hu, W., Chen, W.G., Zhang, Y., Zheng, W.: Tree partition based parallel frequent pattern mining on shared memory systems. IEEE Parallel and Distributed Processing Symposium (2006)
Cheung, W., Zaiane, O.R.: Incremental Mining of Frequent Patterns without Candidate Generation or Support Constraint. In: Seventh International Database Engineering and Applications Symposium (2003)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, Dallas (2000)
Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel database, 321–334 (2004)
Li, W., Han, J.N., Pei, J.: CMAR: Accurate and efficient classification based on multiple-class association rule. In: Proceedings of the International Conference on Data Mining (ICDM 2001), San Jose, CA, pp. 369–376 (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 80–86. AAAI Press, New York (1998)
Moonesinghe, H.D.K., Moon-Jung, C., Pang-Ning, T.: Fast Parallel Mining of Frequent Item-sets (Technical Report MSU-CSE-06-29). Dept. of Computer Science and Engineering, Michigan State University (2006)
Pramudiono, I., Kitsuregawa, M.: Shared nothing parallel execution of FP-growth. DBSJ Letters, v2 i1. 43-46 (2003)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Francisco (1993)
Thabtah, F.: Pruning techniques in associative classification: Survey and comparison. Journal of Digital Information Management 4, 202–205 (2006)
Thakur, G., Ramesh, C.J.: A Framework For Fast Classification Algorithms. International Journal Information Theories & Applications 15, 363–369 (2008)
Yu, K.M., Zhou, J., Hsiao, W.C.: Load balancing approach parallel algorithm for frequent pattern mining. PaCT, 623–631 (2007)
Zaiane, O., Lu, P.: Fast Parallel Association Rules Mining without Candidacy Generation. In: Proceeding of IEEE International Conference on Data Mining (ICDM 2001), pp. 665–668 (2001)
Zhou, J., Yu, K.M.: Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Proceeding of 3rd international conference on grid and pervasive computing, pp. 18–28 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mokeddem, D., Belbachir, H. (2010). A Distributed Associative Classification Algorithm. In: Essaaidi, M., Malgeri, M., Badica, C. (eds) Intelligent Distributed Computing IV. Studies in Computational Intelligence, vol 315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15211-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-15211-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15210-8
Online ISBN: 978-3-642-15211-5
eBook Packages: EngineeringEngineering (R0)