Abstract
Energy-efficient computing has now become a key challenge not only for data-center operations, but also for many other energy-driven systems, with the focus on reducing of all energy-related costs, and operational expenses, as well as its corresponding and environmental impacts. However, current intelligent data models are typically performance driven. For instance, most data-driven machine-learning approaches are often known to require high computational cost in order to find the global optima. Designing more accurate intelligent data models to satisfy the market needs will hence lead to a higher likelihood of energy waste due to the increased computational cost. This paper thus introduces an energy-efficient framework for large-scale data modeling and classification/prediction. It can achieve a predictive accuracy comparable to or better than the state-of-the-art machine-learning models, while at the same time, maintaining a low computational cost when dealing with large-scale data. The effectiveness of the proposed approaches has been demonstrated by our experiments with two large-scale KDD data sets: Mtv-1 and Mtv-2.
Similar content being viewed by others
References
Gartenberg A (2011) Bringing smarter computing to big data, Smarter computing builds a Smarter Planet: 2 in a Series. Available at http://www.adamgartenberg.com/gartenberg/agartenberg.nsf/dx/bringing-smarter-computing-to-big-data1
Hopkins MS (2011) Big data analytics and the path from insights to value. MIT Sloan Manag Rev, 21–32
Tantar AA, Danoy G, Bouvry P, Khan SU (2011) Energy-efficient computing using agent-based multi-objective dynamic optimization. In: Kim JH, Lee MJ (eds) Green IT: technologies and applications. Springer, New York. ISBN 978-3-642-22178-1, Chap. 14
Pinel F, Pecero J, Bouvry P, Khan SU (2010) Memory-aware green scheduling on multi-core processors. In: the 39th IEEE international conference on parallel processing (ICPP), San Diego, CA, USA, September 2010, pp 485–488
Kliazovich D, Bouvry D, Khan SU (2010) DENS: Data center energy-efficient network-aware scheduling. In: ACM/IEEE international conference on green computing and communications (GreenCom), Hangzhou, China, December 2010, pp 69–75
Wang L, Khan SU (2011) Review of performance metrics for green data centers: a taxonomy study. J Supercomput. doi:10.1007/s11227-011-0704-3
Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V. (1994) Boosting and other ensemble methods. Neural Comput 6(6):1289–1301
Melo JCB, Cavalcanti GDC, Guimaraes GDC (2003) PCA feature extraction for protein structure prediction. In: IEEE proc of the 2003 international joint conference on neural networks, Oregon, USA
Weinberger KQ, Blitzer J, Saul LK (2006) Distance metric learning for large margin nearest neighbor classification. In: NIPS. MIT Press, Cambridge
Zhu X, Wu X, Yang Y (2004) Dynamic classifier selection for effective mining from noisy data streams. In: IEEE int conf in data mining (ICDM’04)
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York
Hendrix C, Fuchs E, Grohskopf L, Clough D, Guidos A, Leal J, Wahl R (2005) Dual isotope imaging simultaneously distinguishes the distribution of microbicide and HIV surrogates in the distal colon following simulated intercourse. Presentation, Johns Hopkins University and Centers for Disease Control and Prevention
Ahmed NK, Atiya AF, ElGayar N, El-Shishiny H (2007) Tourism demand forecasting using machine learning methods. Int J Artificial Intell Mach Learn (AIML), Special issue on computational methods for the tourism industry
Yoo PD, Sikder A, Taheri J, Zhou BB, Zomaya AY (2008) DomNet: protein domain boundary prediction server. IEEE Trans NanoBiosci 7(2):172–181
Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28:459–471
Srinoy S (2007) Intrusion detection model based on particle swarm optimization and support vector machine. In: IEEE symposium on CISDA, pp 186–192
Garcia-Nieto J, Talbi EG, Alba E, Jourdan E (2007) A comparison between genetic algorithm and PSO approaches for gene selection and classification of microarray data. In: Proceedings of ACM (GECCO), pp 427–429
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: Proceedings of IEEE international conference on systems, man and cybernetics, pp 4104–4108
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Schapire RE (1999) Theoretical views of boosting and applications, algorithm learning theory. In: Lecture notes in computer science, vol 1720. Springer, Berlin, pp 13–25
Bate A, Lindquist M, Edwards IR, Olsson S, Orre R, Lansner A, De Freitas RM (1998) A Bayesian neural network method for adverse drug reaction signal generation. Clin Pharmacol 54(4):315–321
Dietterich TG, Bakiri G (1995) Machine learning bias, statistical bias and statistical variance of decision tree algorithms. Dept Comput Sci, Oregon State Univ, Corvallies, Tech Rep
Larose DT (2005) Discovering knowledge in data. Wiley, New York
Acknowledgements
We are grateful to the Lincoln Laboratory at Massachusetts Institute of Technology (MIT) in the U.S. for providing us the Mtv-2 data set, as well as their invaluable discussions; and special thanks to the British Telecom (BT) and Etisalat BT Innovation Center (EBTIC) for their constructive criticism on this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yoo, P.D., Zomaya, A.Y. Combining analytic kernel models for energy-efficient data modeling and classification. J Supercomput 63, 790–799 (2013). https://doi.org/10.1007/s11227-012-0776-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0776-8