Skip to main content
Log in

Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

The data analysis is closely related to data attribute dimension. The traditional extraction and partition of data attribute dimension is so manual and inefficiency as to not meet the needs of analysing big data. This paper proposed an attribute dimension partition scheme based on SVM classifying and MapReduce for analysing big data. This scheme improve traditional SVM classifying method by combining Euclidean distance theory for overcoming its disadvantages, and adopts punish coefficient to reduce the unbalance of data distribution. With the improved SVM classifying method, the implementation of attribute dimension partition take MapReduce model of Hadoop as process engine, use TF–IDF vector to save the extracted attribute dimension, and use k-means clustering algorithm to clustering partition. The experiment result shows that the execution efficiency of the proposed method is enhanced, and while the rationality of partition is guaranteed, the increasing of data attributes does not significantly increase the execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Wang, H., Qin, X., Wang, S., et al. (2015). Scalable OLAP queries processing towards large cluster. Chinese Journal of Computer, 38(1), 45–58.

    MathSciNet  Google Scholar 

  2. van der Aalst, W. M. (2013). Process cubes: Slicing, dicing, rolling up and drilling down event data for process mining. Lecture Notes in Business Information Processing, 159, 1–22.

    Article  Google Scholar 

  3. Huser, V. (2012). Process mining: Discovery, conformance and enhancement of business processes. Journal of Biomedical Informatics, 45(5), 1018–1019.

    Article  MathSciNet  Google Scholar 

  4. Archana, S., & Elangovan, K. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Mobile Applications, 2(2), 65–71.

    Google Scholar 

  5. Wu, H. C., Luk, R. W. P., Wong, K. F., et al. (2008). Interpreting TF–IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26(3), 55–59.

    Article  Google Scholar 

  6. Patil, T. R., & Sherekar, S. (2013). Performance analysis of naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256–261.

    Google Scholar 

  7. Abeen, F., Khusro, S., Majid, A., et al. (2014). Semantics discovery in social tagging systems: a review. Multimedia Tools and Applications, 75(1), 1–33.

    Google Scholar 

  8. Askan, A., & Saym, S. (2014). SVM classification for imbalanced data sets using a multiobjective optimization framework. Annals of Operations Research, 216(1), 191–203.

    Article  MathSciNet  MATH  Google Scholar 

  9. Bijalwan, V., Kumar, V., Kumar, P., et al. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.

    Article  Google Scholar 

  10. Yoo, J. Y., & Yang, D. (2015). Classification scheme of unstructured text document using TF–IDF and naive Bayes classifier. Advanced Science and Technology Letters, 111(50), 263–266.

    Article  Google Scholar 

  11. Annasaheb, A. B., & Verma, V. K. (2016). Data mining classification techniques: A recent survey. International Journal of Emerging Technologies in Engineering Research, 4(8), 51–54.

    Google Scholar 

  12. Akinola, S. O., & Oyabugbe, O. J. (2015). Accuracies and training times of data mining classification algorithms: An empirical comparative study. Journal of Software Engineering and Applications, 8(9), 470–477.

    Article  Google Scholar 

  13. Sujatha, R., & Ezhilmaran, D. (2016). Performance analysis of data mining classification techniques for chronic kidney disease. International Journal of Pharmacy and Technology, 8(2), 12032–13037.

    Google Scholar 

  14. Subaira, A. S., & Anitha, P. (2013). Efficient classification mechanism for network intrusion detection system based on data mining techniques: A survey. International Journal of Computer Science and Mobile Computing, 2(10), 274–280.

    Google Scholar 

  15. Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115–121.

    Article  Google Scholar 

  16. Mateus, R. C., Siqueira, T. L. L., Times, V. C., et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425–461.

    Article  Google Scholar 

  17. Zhao, W., & Zhao, Z. (2012). Research on engineering software data formats conversion network. Journal of Software, 7(11), 2606–2613.

    Article  Google Scholar 

  18. Beheshti, S. M. R., & Benatallah, B. (2016). Scalable graph-based OLAP analytics over process execution data. Distributed and Parallel Databases, 34(3), 379–423.

    Article  Google Scholar 

  19. Pokorny, J. (2013). NoSQL databases: A step to database scalability in web environment. International Journal of Web Information Systems, 9(1), 278–283.

    Article  Google Scholar 

  20. Nikhil, N., & Kulkarni, R. B. (2015). Appraisal management system using data mining classification technique. International Journal of Computer Applications, 136(12), 45–58.

    Google Scholar 

  21. Zhao, W., Fan, T., & Wang, H. (2017). Research on data security mechanism among cloud services based on software define network. International Journal of Security and its Application, 11(1), 35–44.

    Article  Google Scholar 

  22. Suma, V. R., Renjith, S., Ashok, S., & Judy, M. V. (2016). Analytical study of selected classification algorithms for clinical dataset. Indian Journal of Science and Technology, 9(11), 1–9.

    Article  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the National Natural Science Foundation of China (Grant No. 61373160), the Standardization Processing and Application System Development of Science and Technology’s Big Data (Grant No. 17210113D), and Science and Technology Resource Survey, Statistical Analysis and System Development (Grant No. 179676334D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tongrang Fan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, W., Fan, T., Nie, Y. et al. Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce. Wireless Pers Commun 102, 2759–2774 (2018). https://doi.org/10.1007/s11277-018-5301-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-018-5301-9

Keywords

Navigation