Skip to main content

Cross Project Validation for Refined Clusters Using Machine Learning Techniques

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7972))

Abstract

Clustering is used for discovering groups and identifying interesting distributions and patterns in the underlying data whereas classification is a technique used to predict membership for data instances within a cluster. Correct classification of similar users in a cluster helps in better prediction of web pages. In the past lot of work has been done on original web log data whereas in this paper we intend to apply classification on refined clusters by implementing Modified Knockout Refinement Algorithm(MKRA). This approach leads to the improvement in cluster quality and prediction accuracy. After refining the clusters using MKRA we apply different learning techniques on refined clusters. Various performance measures of learning techniques are evaluated and compared. These days the machine learning community is trying to get better solutions for improving classification accuracy by applying ensembled classification. We further intend to apply ensembling on the classifiers used in our model to observe the betterment in the classification accuracy performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. King, R., Feng, C., Shutherland, A.: Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence (1995)

    Google Scholar 

  2. Dutton, D., Conroy, G.: A review of machine learning. Knowledge Engineering Review 12, 341–367 (1996)

    Article  Google Scholar 

  3. Mantaras, D., Armengol, E.: Machine learning from examples: Inductive and Lazy methods. Data & Knowledge Engineering 25, 99–123 (1998)

    Article  MATH  Google Scholar 

  4. LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Comparison of learning algorithms for handwritten digit recognition. In: International Conference on Artificial Neural Networks, pp. 53–60. EC2 & Cie, Paris (1995)

    Google Scholar 

  5. Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40, 203–228 (2000)

    Article  MATH  Google Scholar 

  6. Perlich, C., Provost, F., Simono, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4, 211–255 (2003)

    Google Scholar 

  7. Kotsiantis, S.B., Pierrakeas, C.J., Pintelas, P.E.: Preventing student dropout in distance learning using machine learning techniques. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2774, pp. 267–274. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Phyu, T.N.: Survey of classification techniques for Data Mining. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists (2009)

    Google Scholar 

  9. Caruana, R., Mizel, A.: An empirical comparison of Supervised Learning Algorithms. In: Proceedings of the 23rd International Conference on Machine Learning (2006)

    Google Scholar 

  10. Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica 31, 249–268 (2007)

    MathSciNet  MATH  Google Scholar 

  11. Othman, M. F., Yau, T.: Comparison of Different Classification Techniques Using WEKA for Breast Cancer. In: Biomed 2006. IFMBE Proceedings, vol. 15, pp. 520–523. Springer, Heidelberg (2007), www.springerlink.com©

    Google Scholar 

  12. Thombre, A.: Comparing logistic regression, neural networks, C5.0 and m5′ classification techniques. In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 132–140. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Bhatia, S.K., Dixit, V.S.: A Propound Method for the Improvement of Cluster Quality. IJCSI International Journal of Computer Science Issues 9(4(2)), 216–222 (2012)

    Google Scholar 

  14. http://www.rapid-i.com

  15. Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)

    Google Scholar 

  16. Minaei, B., Kashy, D.A., Kortemeyer, G., Punch, W.: Predicting student performance: an application of data mining methods with an educational web-based system. In: Proceedings of 33rd Frontiers in Education Conference, pp. T2A13–T2A18 (2003)

    Google Scholar 

  17. Hussain, Khan, Nazir, Iqbal: Survey of various feature extraction and classification techniques for facial expression recognition. In: Proceedings of the 11th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications, and Proceedings of the 11th WSEAS International Conference on Signal Processing, Robotics and Automation, and Proceedings of the 4th WSEAS International Conference on Nanotechnology, pp. 138–142

    Google Scholar 

  18. Sun, Y., Wong, A.C., Kamel, M.S.: Classification of imbalanced data: A review. Int. J. Pattern Recogn. 23(4), 687–719 (2009)

    Article  Google Scholar 

  19. Kuncheva, L.: Combining pattern classifiers. Wiley Press, New York (2005)

    Google Scholar 

  20. Schaffer, C.: Selecting a classification method by cross-validation. Mach. Learn. 13(1), 135–143 (1993)

    Google Scholar 

  21. Woods, K., Kegelmeyer, W., Bowyer, K.: Combination of multiple classifiers using local accuracy estimates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 405–410 (1997)

    Article  Google Scholar 

  22. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Tech. Decis. 5(4), 597–604 (2006)

    Article  Google Scholar 

  23. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)

    Article  Google Scholar 

  24. Ho, T.K.: Multiple classifier combination: Lessons and next steps. In: Kandel, Bunke (eds.) Hybrid Methods in Pattern Recognition, pp. 171–198. World Scientific, Singapore (2002)

    Chapter  Google Scholar 

  25. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  26. Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  27. Barandela, R., Valdovinos, M., Sanchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. App. 6, 245–256 (2003)

    Article  MathSciNet  Google Scholar 

  28. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions on Systems, Man, and Cybernetics—Part c: Applications and Reviews 42(4) (2012)

    Google Scholar 

  29. Bhatia, S.K., Dixit, V.S., Singh, V.B.: Dissimilarity Measures: Web Session Cluster Refinement and Analysis. To be Published in the Proceedings of the 6th International Conference on Quality, Reliability, Infocom Technology and Industrial Technology Management Organized by Department of Operational Research, University of Delhi, November 26-28 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dixit, V.S., Bhatia, S.K. (2013). Cross Project Validation for Refined Clusters Using Machine Learning Techniques. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7972. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39643-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39643-4_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39642-7

  • Online ISBN: 978-3-642-39643-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics