Abstract
We present implementations of two data-mining algorithms on a CELL processor, and on a low-cost CBEA (CELL Broadband Engine Architecture) cluster using multiple PlayStation3 consoles. Typical batch-processing environments are often unsuitable for interactive data-mining processes that require repeated adjustments to parameters, pre-processing steps, and data, while contemporary desktops do not offer sufficient resources for the large datasets available today. Our implementations for the k Nearest Neighbour algorithm and the Decision Tree scale linearly with the number of samples in the training data and the number of processors, and demonstrate runtimes of under a minute for up to 500 000 samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrade, H., Kurc, T., Sussman, A., Saltz, J.: Decision Tree Construction for Data Mining on Clusters of Shared-Memory Multiprocessors. In: Proceedings of International Workshop on High Performance Data Mining, HPDM 2003 (2003)
Aparício, G., Blanquer, I., Hernández, V.: A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 225–235. Springer, Heidelberg (2007)
Buehere, G., Parthasarathy, S., Goyder, M.: Data Mining on the Cell Broadband Engine. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 26–35. ACM, New York (2008)
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: SCOP3. A Rough Guide to Scientific Computing on the PlayStation3. Technical Report UT-CS-07-595, Innovative Computing Laboriatory, University of Tennessee Knoxville (2007)
Chow, A., Fossum, G., Brokenshire, D.: A Programming Example: Large FFT on the Cell Broadband Engine, IBM (2005)
Cover, T.M., Hart, P.E.: Nearest Neighbour Pattern Recognition. IEEE Transactions on Information Theory 13(1) (1967)
Creecy, R.H., Masand, B.M.H., Smith, S.J., Waltz, D.L.: Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8), 48–63 (1992)
DeFabritiis, G.: Performance of the Cell Processor for Biomolecular Simulations. Computer Physics Communications 176(11-12), 660–664 (2007)
Duan, R., Strey, A.: Data Mining Algorithms on the Cell Broadband Engine. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 665–675. Springer, Heidelberg (2008)
Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Parallel Nearest Neighbour Algorithms for Text Categorization. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 328–337. Springer, Heidelberg (2007)
Gini, C.: Measurement of Inequality of Incomes. The Economic Journal 31, 124–126 (1921)
Han, E., Srivastava, A., Kumar, V.: Parallel Formulation of Inductive Classification Learning Algorithm. Technical Report 96-040, Department of Computer and Information Sciences, University of Minnesota (1996)
Héman, S., Nes, N., Zukowski, M., Boncz, P.: Vectorized Data Processing on the Cell Broadband Engine. In: Proceedings of the 3rd International Workshop on Data Management on New Hardware, Beijing, China (2007)
Hong, W., Takizawa, H., Kobyashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: 8th IEEE International Symposium on Cluster Computing and the Grid, pp. 633–638. IEEE Press, New York (2008)
Programming the Cell Broadband Engine Architecture: Examples and Best Practices, http://www.redbooks.ibm.com/abstracts/sg247575.html
Jin, R., Yang, G., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface and Performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)
Joshi, M., Karypis, G., Kumar, V.: ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets. In: Proceedings of the 11th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos (1998)
Kurzak, J., Alvaro, W., Dongarra, J.: Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL processor. Parallel Computing 35, 138–150 (2009)
Li, X.: Nearest Neighbour Classification on two types of SIMD machines. Parallel Computing 17, 381–407 (1991)
Narlikar, G.: A Parallel, Multithreaded Decision Tree Builder. CMU-CS-98-184. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1998)
Wang, H., Takizawa, H., Kobayashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: International Symposium on Cluster Computing and the Grid, pp. 633–638 (2008)
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The Potential of the Cell Processor for Scientific Computing. In: Proceedings of the 3rd Conference on Computing Frontiers, pp. 9–20. ACM, New York (2006)
Wyganowski, M.: Classification Algorithms on the Cell Processor. MSc. Thesis, Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY (2008)
Zaki, M., Ho, C., Agrawal, R.: Parallel Classification on SMP Systems. In: The 1st Workshop on High Performance Data Mining (in conjuction with IPPS 1998), Orlando, FL, USA (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McConnell, S., Patton, D., Hurley, R., Blight, W., Young, G. (2010). Interactive Data Mining on a CBEA Cluster. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds) High Performance Computing Systems and Applications. HPCS 2009. Lecture Notes in Computer Science, vol 5976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12659-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-12659-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12658-1
Online ISBN: 978-3-642-12659-8
eBook Packages: Computer ScienceComputer Science (R0)