Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce

Zhao, Wenbin; Fan, Tongrang; Nie, Yongchuan; Wu, Feng; Wen, Hou

doi:10.1007/s11277-018-5301-9

Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce

Published: 22 February 2018

Volume 102, pages 2759–2774, (2018)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Wenbin Zhao¹,
Tongrang Fan¹,
Yongchuan Nie²,
Feng Wu² &
…
Hou Wen¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

The data analysis is closely related to data attribute dimension. The traditional extraction and partition of data attribute dimension is so manual and inefficiency as to not meet the needs of analysing big data. This paper proposed an attribute dimension partition scheme based on SVM classifying and MapReduce for analysing big data. This scheme improve traditional SVM classifying method by combining Euclidean distance theory for overcoming its disadvantages, and adopts punish coefficient to reduce the unbalance of data distribution. With the improved SVM classifying method, the implementation of attribute dimension partition take MapReduce model of Hadoop as process engine, use TF–IDF vector to save the extracted attribute dimension, and use k-means clustering algorithm to clustering partition. The experiment result shows that the execution efficiency of the proposed method is enhanced, and while the rationality of partition is guaranteed, the increasing of data attributes does not significantly increase the execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

Article 15 February 2024

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Big Data Clustering Using MapReduce Framework: A Review

References

Wang, H., Qin, X., Wang, S., et al. (2015). Scalable OLAP queries processing towards large cluster. Chinese Journal of Computer, 38(1), 45–58.
MathSciNet Google Scholar
van der Aalst, W. M. (2013). Process cubes: Slicing, dicing, rolling up and drilling down event data for process mining. Lecture Notes in Business Information Processing, 159, 1–22.
Article Google Scholar
Huser, V. (2012). Process mining: Discovery, conformance and enhancement of business processes. Journal of Biomedical Informatics, 45(5), 1018–1019.
Article MathSciNet Google Scholar
Archana, S., & Elangovan, K. (2014). Survey of classification techniques in data mining. International Journal of Computer Science and Mobile Applications, 2(2), 65–71.
Google Scholar
Wu, H. C., Luk, R. W. P., Wong, K. F., et al. (2008). Interpreting TF–IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26(3), 55–59.
Article Google Scholar
Patil, T. R., & Sherekar, S. (2013). Performance analysis of naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256–261.
Google Scholar
Abeen, F., Khusro, S., Majid, A., et al. (2014). Semantics discovery in social tagging systems: a review. Multimedia Tools and Applications, 75(1), 1–33.
Google Scholar
Askan, A., & Saym, S. (2014). SVM classification for imbalanced data sets using a multiobjective optimization framework. Annals of Operations Research, 216(1), 191–203.
Article MathSciNet MATH Google Scholar
Bijalwan, V., Kumar, V., Kumar, P., et al. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1), 61–70.
Article Google Scholar
Yoo, J. Y., & Yang, D. (2015). Classification scheme of unstructured text document using TF–IDF and naive Bayes classifier. Advanced Science and Technology Letters, 111(50), 263–266.
Article Google Scholar
Annasaheb, A. B., & Verma, V. K. (2016). Data mining classification techniques: A recent survey. International Journal of Emerging Technologies in Engineering Research, 4(8), 51–54.
Google Scholar
Akinola, S. O., & Oyabugbe, O. J. (2015). Accuracies and training times of data mining classification algorithms: An empirical comparative study. Journal of Software Engineering and Applications, 8(9), 470–477.
Article Google Scholar
Sujatha, R., & Ezhilmaran, D. (2016). Performance analysis of data mining classification techniques for chronic kidney disease. International Journal of Pharmacy and Technology, 8(2), 12032–13037.
Google Scholar
Subaira, A. S., & Anitha, P. (2013). Efficient classification mechanism for network intrusion detection system based on data mining techniques: A survey. International Journal of Computer Science and Mobile Computing, 2(10), 274–280.
Google Scholar
Perveen, S., Shahbaz, M., Guergachi, A., & Keshavjee, K. (2016). Performance analysis of data mining classification techniques to predict diabetes. Procedia Computer Science, 82, 115–121.
Article Google Scholar
Mateus, R. C., Siqueira, T. L. L., Times, V. C., et al. (2016). Spatial data warehouses and spatial OLAP come towards the cloud: Design and performance. Distributed and Parallel Databases, 34(3), 425–461.
Article Google Scholar
Zhao, W., & Zhao, Z. (2012). Research on engineering software data formats conversion network. Journal of Software, 7(11), 2606–2613.
Article Google Scholar
Beheshti, S. M. R., & Benatallah, B. (2016). Scalable graph-based OLAP analytics over process execution data. Distributed and Parallel Databases, 34(3), 379–423.
Article Google Scholar
Pokorny, J. (2013). NoSQL databases: A step to database scalability in web environment. International Journal of Web Information Systems, 9(1), 278–283.
Article Google Scholar
Nikhil, N., & Kulkarni, R. B. (2015). Appraisal management system using data mining classification technique. International Journal of Computer Applications, 136(12), 45–58.
Google Scholar
Zhao, W., Fan, T., & Wang, H. (2017). Research on data security mechanism among cloud services based on software define network. International Journal of Security and its Application, 11(1), 35–44.
Article Google Scholar
Suma, V. R., Renjith, S., Ashok, S., & Judy, M. V. (2016). Analytical study of selected classification algorithms for clinical dataset. Indian Journal of Science and Technology, 9(11), 1–9.
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge the National Natural Science Foundation of China (Grant No. 61373160), the Standardization Processing and Application System Development of Science and Technology’s Big Data (Grant No. 17210113D), and Science and Technology Resource Survey, Statistical Analysis and System Development (Grant No. 179676334D).

Author information

Authors and Affiliations

School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, Hebei, China
Wenbin Zhao, Tongrang Fan & Hou Wen
Institute of Scientific and Technical Information of Heibei Province, Shijiazhuang, Hebei, China
Yongchuan Nie & Feng Wu

Authors

Wenbin Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Tongrang Fan
View author publications
You can also search for this author inPubMed Google Scholar
Yongchuan Nie
View author publications
You can also search for this author inPubMed Google Scholar
Feng Wu
View author publications
You can also search for this author inPubMed Google Scholar
Hou Wen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Tongrang Fan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, W., Fan, T., Nie, Y. et al. Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce. Wireless Pers Commun 102, 2759–2774 (2018). https://doi.org/10.1007/s11277-018-5301-9

Download citation

Published: 22 February 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11277-018-5301-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Attribute Dimension Partition Based on SVM Classifying and MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An optimized SVM-RFE based feature selection and weighted entropy K-means approach for big data clustering in mapreduce

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Big Data Clustering Using MapReduce Framework: A Review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now