Abstract
Particle classification is one of the major analyses in high-energy particle physics experiments. We design a classification framework combining classification and clustering for particle physics experiments data. The system involves classification by a set of Artificial Neural Networks (ANN); each using distinct subsets of samples selected from the general set. We use frequent variable sets based clustering for partitioning the train samples into several natural subsets, then standard back-propagation ANNs are trained on them. The final decision for each test case is a two-step process. First, the nearest cluster is found for the case, and then the decision is based on the ANN classifier trained on the specific cluster. Comparisons with other classification and clustering methods show that our method is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rennie, J.D., et al.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Twentieth International Conference on Machine Learning, August 22 (2003)
Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Funahashi, K.-i.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks 2(3), 183–192 (1989)
Haykin, S.: Neural Networks - A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1998)
Hochreiter, S., Schmidhuber, J.: Feature Extraction Through LOCOCODE. Neural Computation 11(3), 679–714 (1999)
Hornik, K., Stinchcombe, M., White, H.: Multilayer Feedforward Networks are Universal Approximators. Neural Networks 2(5), 359–366 (1989)
KDD Cup (2004), http://kodiak.cs.cornell.edu/kddcup/index.html
Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for Association Rule Mining – a General Survey and Comparison. ACM SIGKDD Explorations 2, 58–64 (2000)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of ACM SIGMOD’00 (2000)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proc. VLDB 94, Santiago de Chile, Chile, pp. 487–499 (1994)
Kunze, M.: Application of Artificial Neural Networks in the Analysis of Multi-Particle Data. In: The Proceedings of the CORINNEII Conference (1994)
KDD Cup 2004 – Description of Performance Metrics(2006), http://kodiak.cs.cornell.edu/kddcup/metrics.html
Statnikov, A., Aliferis, C.F., Tsamardinos, I., Hardin, D.P., Levy, S.: A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis. Bioinformatics (2004)
Hipp, J., Guntzer, U., Nakhaeizadeh, G.: Algorithms for Association Rule Mining - a General Survey and Comparison. ACM SIGKDD Explorations 2(1), 58–64 (2000)
Fung, B., Wang, K., Ester, M.: Large Hierarchical Document Clustering Using Frequent Itemsets. In: Proc. SIAM International Conference on Data Mining 2003 (SDM ‘2003), San Francisco, CA (May 2003)
Beil, F., Ester, M., Xu, X.: Frequent Term-based Text Clustering. In: KDD, pp. 436–442 (2002)
Aha, D., Kibler, D.: Instance-based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Witten, I., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)
Dubes, R.C., Jain, A.K.: Algorithms for Clustering Data. Prentice Hall College Div., Englewood Cliffs (March 1998)
Schneider, K.-M.: A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, April 2003, pp. 307–314 (2003)
Jin, X., Xu, A., Bie, R., Guo, P.: Kernel Independent Component Analysis for Gene Expression Data Clustering. In: Rosca, J.P., Erdogmus, D., PrÃncipe, J.C., Haykin, S. (eds.) ICA 2006. LNCS, vol. 3889, pp. 454–461. Springer, Heidelberg (2006)
Aha, D., Kibler, D.: Instance-based Learning Algorithms. Machine Learning 6, 37–66 (1991)
Indyk, P.: Nearest Neighbors in High-dimensional Spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, 2nd edn., CRC Press, Boca Raton (2004)
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Chai, X., Deng, L., Yang, Q., Ling, C.X.: Test-Cost Sensitive Naive Bayes Classification. In: CDM 2004, pp. 51–58 (2004)
Flach, P.A., Lachiche, N.: Naive Bayesian Classification of Structured Data. Machine Learning 57(3), 233–269 (2004)
Wang, H., et al.: Clustering by Pattern Similarity in Large Data sets. In: SIGMOD, pp. 394–405 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Jin, X., Bie, R. (2007). Frequent Variable Sets Based Clustering for Artificial Neural Networks Particle Classification. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds) Advances in Data and Web Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72524-4_88
Download citation
DOI: https://doi.org/10.1007/978-3-540-72524-4_88
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72483-4
Online ISBN: 978-3-540-72524-4
eBook Packages: Computer ScienceComputer Science (R0)