Skip to main content

Interactive Data Mining on a CBEA Cluster

  • Conference paper
High Performance Computing Systems and Applications (HPCS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5976))

  • 1347 Accesses

Abstract

We present implementations of two data-mining algorithms on a CELL processor, and on a low-cost CBEA (CELL Broadband Engine Architecture) cluster using multiple PlayStation3 consoles. Typical batch-processing environments are often unsuitable for interactive data-mining processes that require repeated adjustments to parameters, pre-processing steps, and data, while contemporary desktops do not offer sufficient resources for the large datasets available today. Our implementations for the k Nearest Neighbour algorithm and the Decision Tree scale linearly with the number of samples in the training data and the number of processors, and demonstrate runtimes of under a minute for up to 500 000 samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrade, H., Kurc, T., Sussman, A., Saltz, J.: Decision Tree Construction for Data Mining on Clusters of Shared-Memory Multiprocessors. In: Proceedings of International Workshop on High Performance Data Mining, HPDM 2003 (2003)

    Google Scholar 

  2. Aparício, G., Blanquer, I., Hernández, V.: A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds.) VECPAR 2006. LNCS, vol. 4395, pp. 225–235. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Buehere, G., Parthasarathy, S., Goyder, M.: Data Mining on the Cell Broadband Engine. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp. 26–35. ACM, New York (2008)

    Chapter  Google Scholar 

  4. Buttari, A., Luszczek, P., Kurzak, J., Dongarra, J., Bosilca, G.: SCOP3. A Rough Guide to Scientific Computing on the PlayStation3. Technical Report UT-CS-07-595, Innovative Computing Laboriatory, University of Tennessee Knoxville (2007)

    Google Scholar 

  5. Chow, A., Fossum, G., Brokenshire, D.: A Programming Example: Large FFT on the Cell Broadband Engine, IBM (2005)

    Google Scholar 

  6. Cover, T.M., Hart, P.E.: Nearest Neighbour Pattern Recognition. IEEE Transactions on Information Theory 13(1) (1967)

    Google Scholar 

  7. Creecy, R.H., Masand, B.M.H., Smith, S.J., Waltz, D.L.: Trading MIPS and Memory for Knowledge Engineering. Communications of the ACM 35(8), 48–63 (1992)

    Article  Google Scholar 

  8. DeFabritiis, G.: Performance of the Cell Processor for Biomolecular Simulations. Computer Physics Communications 176(11-12), 660–664 (2007)

    Article  Google Scholar 

  9. Duan, R., Strey, A.: Data Mining Algorithms on the Cell Broadband Engine. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 665–675. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Gil-García, R.J., Badía-Contelles, J.M., Pons-Porrata, A.: Parallel Nearest Neighbour Algorithms for Text Categorization. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 328–337. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Gini, C.: Measurement of Inequality of Incomes. The Economic Journal 31, 124–126 (1921)

    Article  Google Scholar 

  12. Han, E., Srivastava, A., Kumar, V.: Parallel Formulation of Inductive Classification Learning Algorithm. Technical Report 96-040, Department of Computer and Information Sciences, University of Minnesota (1996)

    Google Scholar 

  13. Héman, S., Nes, N., Zukowski, M., Boncz, P.: Vectorized Data Processing on the Cell Broadband Engine. In: Proceedings of the 3rd International Workshop on Data Management on New Hardware, Beijing, China (2007)

    Google Scholar 

  14. Hong, W., Takizawa, H., Kobyashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: 8th IEEE International Symposium on Cluster Computing and the Grid, pp. 633–638. IEEE Press, New York (2008)

    Google Scholar 

  15. Programming the Cell Broadband Engine Architecture: Examples and Best Practices, http://www.redbooks.ibm.com/abstracts/sg247575.html

  16. Jin, R., Yang, G., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface and Performance. IEEE Transactions on Knowledge and Data Engineering 17, 71–89 (2005)

    Article  Google Scholar 

  17. Joshi, M., Karypis, G., Kumar, V.: ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets. In: Proceedings of the 11th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos (1998)

    Google Scholar 

  18. Kurzak, J., Alvaro, W., Dongarra, J.: Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL processor. Parallel Computing 35, 138–150 (2009)

    Article  Google Scholar 

  19. Li, X.: Nearest Neighbour Classification on two types of SIMD machines. Parallel Computing 17, 381–407 (1991)

    Article  MATH  Google Scholar 

  20. Narlikar, G.: A Parallel, Multithreaded Decision Tree Builder. CMU-CS-98-184. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (1998)

    Google Scholar 

  21. Wang, H., Takizawa, H., Kobayashi, H.: A Performance Study of Secure Data Mining on the Cell Processor. In: International Symposium on Cluster Computing and the Grid, pp. 633–638 (2008)

    Google Scholar 

  22. Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The Potential of the Cell Processor for Scientific Computing. In: Proceedings of the 3rd Conference on Computing Frontiers, pp. 9–20. ACM, New York (2006)

    Chapter  Google Scholar 

  23. Wyganowski, M.: Classification Algorithms on the Cell Processor. MSc. Thesis, Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY (2008)

    Google Scholar 

  24. Zaki, M., Ho, C., Agrawal, R.: Parallel Classification on SMP Systems. In: The 1st Workshop on High Performance Data Mining (in conjuction with IPPS 1998), Orlando, FL, USA (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

McConnell, S., Patton, D., Hurley, R., Blight, W., Young, G. (2010). Interactive Data Mining on a CBEA Cluster. In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds) High Performance Computing Systems and Applications. HPCS 2009. Lecture Notes in Computer Science, vol 5976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12659-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12659-8_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12658-1

  • Online ISBN: 978-3-642-12659-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics