skip to main content
10.1145/1375527.1375534acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Data mining on the cell broadband engine

Published: 07 June 2008 Publication History

Abstract

The STI Cell Broadband Engine architecture represents an interesting design point along the spectrum of chipsets with multiple processing elements. In this article we investigate key mining tasks such as clustering, classification, anomaly detection and PageRank on the Cell along the axes of performance, programming complexity and algorithm design. As part of our comparative analysis we juxtapose these algorithms with similar ones implemented on state-of-the-art uniprocessor and multicore architectures. For the workloads that are more oating point intensive, and where data is accessed in a streaming fashion the Cell processor is up to seven times faster than competing technologies, when the underlying algorithm uses the hardware efficiently.
However, when required to write in a non-streaming fashion, as with PageRank, the processor is up to twenty times slower than competing processors. An outcome of our benchmark study, beyond the results on these particular algorithms is that we answer several higher level questions, which are designed to provide a fast and reliable estimate to application designers for how well other workloads will scale on the Cell.

References

[1]
K. Alsabti, S. Ranka, and V. Singh. An efficient kmeans clustering algorithm. In In Proceedings of the IPPS/SPDP Workshop on High Performance Data Mining (HPDM), 1998.
[2]
S. Bay and M. Schwabacher. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the 9th International Conference on Knowledge Discovery and Data mining (KDD), pages 478--487, 2003.
[3]
G. Buehrer and S. Parthasarathy. The potential of the cell broadband engine for data mining. In Ohio State Univerity Technical Report OSU-CISRC-3/07--TR22, ftp://ftp.cse.ohio-state.edu/pub/tech-report/2007/TR22.pdf, 2007.
[4]
Paul B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1):67--90, 1995.
[5]
Amitabh Chaudhary, Alexander S. Szalay, and Andrew W. Moore. Very fast outlier detection in large multidimensional data sets. In ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 2002.
[6]
Thomas Chen, Ram Raghavan, Jason Dale, and Eiji Iwata. Cell broadband engine architecture and its first implementation: A performance view. In IBM DeveloperWorks, http://www-128.ibm.com/developerworks/power/library/pa-cellperf/, 2005.
[7]
B. Flachs, S. Asano, S.H. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le1, P. Liu1, J. Leenstra, J. Liberty, B. Michael, H. Oh1, S. M. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, and N. Yano3. A streaming processing unit for a cell processor. In Proceedings of the International Solid-State Circuits Conference, 2005.
[8]
Bugra Gedik, Rajesh Bordawekar, and Philip Yu. Cellsort: High performance sorting on the cell processor. In Proceedings of International Conference on Very Large Data Bases (VLDB), 2007.
[9]
A. Ghoting, S. Parthasarathy, and M. Otey. Fast mining of distance-based outliers in high dimensional datasets. In Proceedings of the SIAM International Conference on Data Mining (SDM), 2006.
[10]
J. Han and M. Kamber. In Data Mining: Concepts and Techniques, 2000, 1967. Morgan Kaufmann Publishers.
[11]
D. Kunzman, G. Zheng, E. Bohm, and L. Kale. Charm++, ofload api, and the cell processor. In Proceedings of the Workshop on Programming Models for Ubiquitous Parallelism at PACT, 2006.
[12]
S. Liao and M. Lopez S. Leutenegger. High dimensional similarity search with space filling curves. In Proceedings of the 17th International Conference on Data Engineering, 2001.
[13]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
[14]
D. Pelleg and A. Moore. Accelerating exact kmeans algorithms with geometric reasoning. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 1999.
[15]
Thomas Seidl and Hans-Peter Kriegel. Optimal multi-step k-nearest neighbor search. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 154 -- 165, Seattle, Washington, United States, 1998.
[16]
Changzhou Wang and Xiaoyang Sean Wang. High-dimensional nearest neighbor search with remote data centers. Knowl. Inf. Syst., 4(4):440--465, 2002.
[17]
R. Weber and P. Zezula. The theory and practice of searches in high dimensional data spaces. In Proceedings of the 4th DELOS Workshop on Image Indexing and Retrieval, 1997.
[18]
S. Williams, J. Shalf, L. Oliker, S. Kamil, P Husbands, and K. Yelick. The potential of the cell processor for scientific computing. In Proceedings of Computing Frontiers, 2006.

Cited By

View all
  • (2024)DuMato: An Efficient Warp-Centric Subgraph Enumeration System for GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104903(104903)Online publication date: Apr-2024
  • (2023)Graph Pattern Mining Paradigms: Consolidation and Renewed Bearing2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00040(224-233)Online publication date: 18-Dec-2023
  • (2017)Programming and Managing Resources on Accelerator‐Enabled ClustersProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch20(405-429)Online publication date: 27-Jan-2017
  • Show More Cited By

Index Terms

  1. Data mining on the cell broadband engine

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
    June 2008
    390 pages
    ISBN:9781605581583
    DOI:10.1145/1375527
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CMP
    2. cell bdea
    3. mutlicore

    Qualifiers

    • Research-article

    Conference

    ICS08
    Sponsor:
    ICS08: International Conference on Supercomputing
    June 7 - 12, 2008
    Island of Kos, Greece

    Acceptance Rates

    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DuMato: An Efficient Warp-Centric Subgraph Enumeration System for GPUJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104903(104903)Online publication date: Apr-2024
    • (2023)Graph Pattern Mining Paradigms: Consolidation and Renewed Bearing2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00040(224-233)Online publication date: 18-Dec-2023
    • (2017)Programming and Managing Resources on Accelerator‐Enabled ClustersProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch20(405-429)Online publication date: 27-Jan-2017
    • (2012)Parallelization of pagerank on multicore processorsProceedings of the 8th international conference on Distributed Computing and Internet Technology10.1007/978-3-642-28073-3_12(129-140)Online publication date: 2-Feb-2012
    • (2011)Efficient Nonserial Polyadic Dynamic Programming on the Cell ProcessorProceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum10.1109/IPDPS.2011.186(460-471)Online publication date: 16-May-2011
    • (2011)P-means, a parallel clustering algorithm for a heterogeneous multi-processor environment2011 International Conference on High Performance Computing & Simulation10.1109/HPCSim.2011.5999830(239-248)Online publication date: Jul-2011
    • (2011)Community Discovery in Social Networks: Applications, Methods and Emerging TrendsSocial Network Data Analytics10.1007/978-1-4419-8462-3_4(79-113)Online publication date: 17-Mar-2011
    • (2010)G-means improved for cell BE environmentFacing the multicore-challenge10.5555/1986583.1986594(54-65)Online publication date: 1-Jan-2010
    • (2010)G-means improved for cell BE environmentFacing the multicore-challenge10.5555/1980597.1980608(54-65)Online publication date: 1-Jan-2010
    • (2010)Cell BE and Bluetooth applied to Digital TV2010 IEEE Network Operations and Management Symposium - NOMS 201010.1109/NOMS.2010.5488364(825-828)Online publication date: Apr-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media