Interactive Data Mining on a CBEA Cluster

McConnell, Sabine; Patton, David; Hurley, Richard; Blight, Wilfred; Young, Graeme

doi:10.1007/978-3-642-12659-8_20

Sabine McConnell²⁰,
David Patton²¹,
Richard Hurley²⁰,
Wilfred Blight²⁰ &
…
Graeme Young²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5976))

Included in the following conference series:

International Symposium on High Performance Computing Systems and Applications

1347 Accesses

Abstract

We present implementations of two data-mining algorithms on a CELL processor, and on a low-cost CBEA (CELL Broadband Engine Architecture) cluster using multiple PlayStation3 consoles. Typical batch-processing environments are often unsuitable for interactive data-mining processes that require repeated adjustments to parameters, pre-processing steps, and data, while contemporary desktops do not offer sufficient resources for the large datasets available today. Our implementations for the k Nearest Neighbour algorithm and the Decision Tree scale linearly with the number of samples in the training data and the number of processors, and demonstrate runtimes of under a minute for up to 500 000 samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrade, H., Kurc, T., Sussman, A., Saltz, J.: Decision Tree Construction for Data Mining on Clusters of Shared-Memory Multiprocessors. In: Proceedings of International Workshop on High Performance Data Mining, HPDM 2003 (2003)
Google Scholar
Aparício, G., Blanquer, I., Hernández, V.: A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 225–235. Springer, Heidelberg (2007)
Chapter Google Scholar
Buehere, G., Parthasarathy, S., Goyder, M.: Data Mining on the Cell Broadband Engine. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 26–35. ACM, New York (2008)
Chapter Google Scholar
Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: SCOP3. A Rough Guide to Scientific Computing on the PlayStation3. Technical Report UT-CS-07-595, Innovative Computing Laboriatory, University of Tennessee Knoxville (2007)
Google Scholar
Chow, A., Fossum, G., Brokenshire, D.: A Programming Example: Large FFT on the Cell Broadband Engine, IBM (2005)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest Neighbour Pattern Recognition. IEEE Transactions on Information Theory 13(1) (1967)
Google Scholar
Creecy, R.H., Masand, B.M.H., Smith, S.J., Waltz, D.L.: Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8), 48–63 (1992)
Article Google Scholar
DeFabritiis, G.: Performance of the Cell Processor for Biomolecular Simulations. Computer Physics Communications 176(11-12), 660–664 (2007)
Article Google Scholar
Duan, R., Strey, A.: Data Mining Algorithms on the Cell Broadband Engine. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 665–675. Springer, Heidelberg (2008)
Chapter Google Scholar
Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Parallel Nearest Neighbour Algorithms for Text Categorization. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 328–337. Springer, Heidelberg (2007)
Chapter Google Scholar
Gini, C.: Measurement of Inequality of Incomes. The Economic Journal 31, 124–126 (1921)
Article Google Scholar
Han, E., Srivastava, A., Kumar, V.: Parallel Formulation of Inductive Classification Learning Algorithm. Technical Report 96-040, Department of Computer and Information Sciences, University of Minnesota (1996)
Google Scholar
Héman, S., Nes, N., Zukowski, M., Boncz, P.: Vectorized Data Processing on the Cell Broadband Engine. In: Proceedings of the 3rd International Workshop on Data Management on New Hardware, Beijing, China (2007)
Google Scholar
Hong, W., Takizawa, H., Kobyashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: 8th IEEE International Symposium on Cluster Computing and the Grid, pp. 633–638. IEEE Press, New York (2008)
Google Scholar
Programming the Cell Broadband Engine Architecture: Examples and Best Practices, http://www.redbooks.ibm.com/abstracts/sg247575.html
Jin, R., Yang, G., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface and Performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)
Article Google Scholar
Joshi, M., Karypis, G., Kumar, V.: ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets. In: Proceedings of the 11th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos (1998)
Google Scholar
Kurzak, J., Alvaro, W., Dongarra, J.: Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL processor. Parallel Computing 35, 138–150 (2009)
Article Google Scholar
Li, X.: Nearest Neighbour Classification on two types of SIMD machines. Parallel Computing 17, 381–407 (1991)
Article MATH Google Scholar
Narlikar, G.: A Parallel, Multithreaded Decision Tree Builder. CMU-CS-98-184. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1998)
Google Scholar
Wang, H., Takizawa, H., Kobayashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: International Symposium on Cluster Computing and the Grid, pp. 633–638 (2008)
Google Scholar
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The Potential of the Cell Processor for Scientific Computing. In: Proceedings of the 3rd Conference on Computing Frontiers, pp. 9–20. ACM, New York (2006)
Chapter Google Scholar
Wyganowski, M.: Classification Algorithms on the Cell Processor. MSc. Thesis, Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY (2008)
Google Scholar
Zaki, M., Ho, C., Agrawal, R.: Parallel Classification on SMP Systems. In: The 1st Workshop on High Performance Data Mining (in conjuction with IPPS 1998), Orlando, FL, USA (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing and Information Systems, Trent University, 1600 West Bank Drive, Peterborough, ON, Canada, K9J7B8
Sabine McConnell, Richard Hurley, Wilfred Blight & Graeme Young
Department of Physics and Astronomy, Trent University, 1600 West Bank Drive, Peterborough, ON, Canada, K9J7B8
David Patton

Authors

Sabine McConnell
View author publications
You can also search for this author in PubMed Google Scholar
David Patton
View author publications
You can also search for this author in PubMed Google Scholar
Richard Hurley
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Blight
View author publications
You can also search for this author in PubMed Google Scholar
Graeme Young
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology, Queen‘s University, 62 Arch St, K7L 3N6, Kingston, Ontario, Canada
Douglas J. K. Mewhort
Dept of Chemistry, Queen’s University, Chernoff Hall, K7L 3N6, Kingston, Ontario, Canada
Natalie M. Cann
University of Ottawa, Hagen Hall, 115 Séraphin-Marion, K1N 6N5, Ottawa, Ontario, Canada
Gary W. Slater
Oak Ridge National Laboratory, 1 Bethel Valley Road, Bldg. 5100, MS-6173, Oak Ridge, 37831-6173, TN, USA
Thomas J. Naughton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McConnell, S., Patton, D., Hurley, R., Blight, W., Young, G. (2010). Interactive Data Mining on a CBEA Cluster. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds) High Performance Computing Systems and Applications. HPCS 2009. Lecture Notes in Computer Science, vol 5976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12659-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-12659-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12658-1
Online ISBN: 978-3-642-12659-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics