Abstract
Clustering is used for discovering groups and identifying interesting distributions and patterns in the underlying data whereas classification is a technique used to predict membership for data instances within a cluster. Correct classification of similar users in a cluster helps in better prediction of web pages. In the past lot of work has been done on original web log data whereas in this paper we intend to apply classification on refined clusters by implementing Modified Knockout Refinement Algorithm(MKRA). This approach leads to the improvement in cluster quality and prediction accuracy. After refining the clusters using MKRA we apply different learning techniques on refined clusters. Various performance measures of learning techniques are evaluated and compared. These days the machine learning community is trying to get better solutions for improving classification accuracy by applying ensembled classification. We further intend to apply ensembling on the classifiers used in our model to observe the betterment in the classification accuracy performance.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
King, R., Feng, C., Shutherland, A.: Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence (1995)
Dutton, D., Conroy, G.: A review of machine learning. Knowledge Engineering Review 12, 341–367 (1996)
Mantaras, D., Armengol, E.: Machine learning from examples: Inductive and Lazy methods. Data & Knowledge Engineering 25, 99–123 (1998)
LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: International Conference on Artificial Neural Networks, pp. 53–60. EC2 & Cie, Paris (1995)
Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40, 203–228 (2000)
Perlich, C., Provost, F., Simono, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4, 211–255 (2003)
Kotsiantis, S.B., Pierrakeas, C.J., Pintelas, P.E.: Preventing student dropout in distance learning using machine learning techniques. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2774, pp. 267–274. Springer, Heidelberg (2003)
Phyu, T.N.: Survey of classification techniques for Data Mining. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists (2009)
Caruana, R., Mizel, A.: An empirical comparison of Supervised Learning Algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 249–268 (2007)
Othman, M. F., Yau, T.: Comparison of Different Classification Techniques Using WEKA for Breast Cancer. In: Biomed 2006. IFMBE Proceedings, vol. 15, pp. 520–523. Springer, Heidelberg (2007), www.springerlink.com©
Thombre, A.: Comparing logistic regression, neural networks, C5.0 and m5′ classification techniques. In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 132–140. Springer, Heidelberg (2012)
Bhatia, S.K., Dixit, V.S.: A Propound Method for the Improvement of Cluster Quality. IJCSI International Journal of Computer Science Issues 9(4(2)), 216–222 (2012)
Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Minaei, B., Kashy, D.A., Kortemeyer, G., Punch, W.: Predicting student performance: an application of data mining methods with an educational web-based system. In: Proceedings of 33rd Frontiers in Education Conference, pp. T2A13–T2A18 (2003)
Hussain, Khan, Nazir, Iqbal: Survey of various feature extraction and classification techniques for facial expression recognition. In: Proceedings of the 11th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications, and Proceedings of the 11th WSEAS International Conference on Signal Processing, Robotics and Automation, and Proceedings of the 4th WSEAS International Conference on Nanotechnology, pp. 138–142
Sun, Y., Wong, A.C., Kamel, M.S.: Classification of imbalanced data: A review. Int. J. Pattern Recogn. 23(4), 687–719 (2009)
Kuncheva, L.: Combining pattern classifiers. Wiley Press, New York (2005)
Schaffer, C.: Selecting a classification method by cross-validation. Mach. Learn. 13(1), 135–143 (1993)
Woods, K., Kegelmeyer, W., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 405–410 (1997)
Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Tech. Decis. 5(4), 597–604 (2006)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Ho, T.K.: Multiple classifier combination: Lessons and next steps. In: Kandel, Bunke (eds.) Hybrid Methods in Pattern Recognition, pp. 171–198. World Scientific, Singapore (2002)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
Barandela, R., Valdovinos, M., Sanchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. App. 6, 245–256 (2003)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics—Part c: Applications and Reviews 42(4) (2012)
Bhatia, S.K., Dixit, V.S., Singh, V.B.: Dissimilarity Measures: Web Session Cluster Refinement and Analysis. To be Published in the Proceedings of the 6th International Conference on Quality, Reliability, Infocom Technology and Industrial Technology Management Organized by Department of Operational Research, University of Delhi, November 26-28 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dixit, V.S., Bhatia, S.K. (2013). Cross Project Validation for Refined Clusters Using Machine Learning Techniques. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39643-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-39643-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39642-7
Online ISBN: 978-3-642-39643-4
eBook Packages: Computer ScienceComputer Science (R0)