Cross Project Validation for Refined Clusters Using Machine Learning Techniques

Dixit, Veer Sain; Bhatia, Shveta Kundra

doi:10.1007/978-3-642-39643-4_36

Cross Project Validation for Refined Clusters Using Machine Learning Techniques

Veer Sain Dixit²⁴ &
Shveta Kundra Bhatia²⁵

Conference paper

2076 Accesses
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7972))

Abstract

Clustering is used for discovering groups and identifying interesting distributions and patterns in the underlying data whereas classification is a technique used to predict membership for data instances within a cluster. Correct classification of similar users in a cluster helps in better prediction of web pages. In the past lot of work has been done on original web log data whereas in this paper we intend to apply classification on refined clusters by implementing Modified Knockout Refinement Algorithm(MKRA). This approach leads to the improvement in cluster quality and prediction accuracy. After refining the clusters using MKRA we apply different learning techniques on refined clusters. Various performance measures of learning techniques are evaluated and compared. These days the machine learning community is trying to get better solutions for improving classification accuracy by applying ensembled classification. We further intend to apply ensembling on the classifiers used in our model to observe the betterment in the classification accuracy performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

King, R., Feng, C., Shutherland, A.: Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence (1995)
Google Scholar
Dutton, D., Conroy, G.: A review of machine learning. Knowledge Engineering Review 12, 341–367 (1996)
Article Google Scholar
Mantaras, D., Armengol, E.: Machine learning from examples: Inductive and Lazy methods. Data & Knowledge Engineering 25, 99–123 (1998)
Article MATH Google Scholar
LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: International Conference on Artificial Neural Networks, pp. 53–60. EC2 & Cie, Paris (1995)
Google Scholar
Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40, 203–228 (2000)
Article MATH Google Scholar
Perlich, C., Provost, F., Simono, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4, 211–255 (2003)
Google Scholar
Kotsiantis, S.B., Pierrakeas, C.J., Pintelas, P.E.: Preventing student dropout in distance learning using machine learning techniques. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2774, pp. 267–274. Springer, Heidelberg (2003)
Chapter Google Scholar
Phyu, T.N.: Survey of classification techniques for Data Mining. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists (2009)
Google Scholar
Caruana, R., Mizel, A.: An empirical comparison of Supervised Learning Algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Google Scholar
Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 249–268 (2007)
MathSciNet MATH Google Scholar
Othman, M. F., Yau, T.: Comparison of Different Classification Techniques Using WEKA for Breast Cancer. In: Biomed 2006. IFMBE Proceedings, vol. 15, pp. 520–523. Springer, Heidelberg (2007), www.springerlink.com©
Google Scholar
Thombre, A.: Comparing logistic regression, neural networks, C5.0 and m5′ classification techniques. In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 132–140. Springer, Heidelberg (2012)
Chapter Google Scholar
Bhatia, S.K., Dixit, V.S.: A Propound Method for the Improvement of Cluster Quality. IJCSI International Journal of Computer Science Issues 9(4(2)), 216–222 (2012)
Google Scholar
http://www.rapid-i.com
Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Google Scholar
Minaei, B., Kashy, D.A., Kortemeyer, G., Punch, W.: Predicting student performance: an application of data mining methods with an educational web-based system. In: Proceedings of 33rd Frontiers in Education Conference, pp. T2A13–T2A18 (2003)
Google Scholar
Hussain, Khan, Nazir, Iqbal: Survey of various feature extraction and classification techniques for facial expression recognition. In: Proceedings of the 11th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications, and Proceedings of the 11th WSEAS International Conference on Signal Processing, Robotics and Automation, and Proceedings of the 4th WSEAS International Conference on Nanotechnology, pp. 138–142
Google Scholar
Sun, Y., Wong, A.C., Kamel, M.S.: Classification of imbalanced data: A review. Int. J. Pattern Recogn. 23(4), 687–719 (2009)
Article Google Scholar
Kuncheva, L.: Combining pattern classifiers. Wiley Press, New York (2005)
Google Scholar
Schaffer, C.: Selecting a classification method by cross-validation. Mach. Learn. 13(1), 135–143 (1993)
Google Scholar
Woods, K., Kegelmeyer, W., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 405–410 (1997)
Article Google Scholar
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Tech. Decis. 5(4), 597–604 (2006)
Article Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Article Google Scholar
Ho, T.K.: Multiple classifier combination: Lessons and next steps. In: Kandel, Bunke (eds.) Hybrid Methods in Pattern Recognition, pp. 171–198. World Scientific, Singapore (2002)
Chapter Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Article Google Scholar
Barandela, R., Valdovinos, M., Sanchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. App. 6, 245–256 (2003)
Article MathSciNet Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics—Part c: Applications and Reviews 42(4) (2012)
Google Scholar
Bhatia, S.K., Dixit, V.S., Singh, V.B.: Dissimilarity Measures: Web Session Cluster Refinement and Analysis. To be Published in the Proceedings of the 6th International Conference on Quality, Reliability, Infocom Technology and Industrial Technology Management Organized by Department of Operational Research, University of Delhi, November 26-28 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Atma Ram Sanatan Dharma College, University of Delhi, New Delhi, India
Veer Sain Dixit
Computer Science Department, Swami Shraddhanand College, University of Delhi, New Delhi, India
Shveta Kundra Bhatia

Authors

Veer Sain Dixit
View author publications
You can also search for this author in PubMed Google Scholar
Shveta Kundra Bhatia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

L-I.S.U.T. - D.A.P.I.t. Facoltà Ingegneria, Università degli Studi della Basilicata, Viale dell’Ateneo Lucano, 10, 85100, Potenza, Italy
Beniamino Murgante
Covenant University, Canaanland, OTA, Nigeria
Sanjay Misra
Partimento di Scienze e Tecnologie per LAgricoltura, le Foreste, la Natura e lEnergia, Università degli Studi della Tuscia, Via S. Camillo de Lellis, snc, 01100, Viterbo, Italy
Maurizio Carlini
Dipartimento di Scienze dell’Ingegneria Civile e dell’Architecttura, Politecnico di Bari, Via Orabona, 4, 70125, Bari, Italy
Carmelo M. Torre
International University VNU-HCM, Quarter 6, Linh Trung, Thu Duc, Ho Chi Minh City, Vietnam
Hong-Quang Nguyen
School of Business Systems, Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dixit, V.S., Bhatia, S.K. (2013). Cross Project Validation for Refined Clusters Using Machine Learning Techniques. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39643-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-39643-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39642-7
Online ISBN: 978-3-642-39643-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics