skip to main content
10.1145/2523616.2525952acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

High performance clustering of social images in a map-collective programming model

Published: 01 October 2013 Publication History

Abstract

Large-scale iterative computations are common in many important data mining and machine learning algorithms. Most of these applications can be specified as iterations of MapReduce computations, leading to the Iterative MapReduce programming model [1] for efficient execution of data-intensive iterative computations interoperably between HPC and cloud environments. We observe that a systematic approach to collective communication is essential but notably missing in the current model. Thus we generalize the iterative MapReduce concept to Map-Collective on the premise that large collectives are a distinctive feature of data intensive and data mining applications. To show the necessity of Map-Collective model, this paper studies the implications of large-scale social image clustering problems, where 10--100 million images represented as points in a high dimensional (up to 2048) vector space are required to be divided into 1--10 million clusters.

References

[1]
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae, J. Qiu, G. Fox. "Twister: A Runtime for iterative MapReduce." The First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference, June 20--25, 2010. 2010, ACM: Chicago, Illinois.
[2]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. "Spark: Cluster Computing with Working Sets." HotCloud, 2010.
[3]
J. Qiu, B. Zhang, "Mammoth Data in the Cloud: Clustering Social Images." Clouds, Grids and Big Data, IOS Press, 2013.
[4]
Charles Elkan, "Using the triangle inequality to accelerate k-means." Intl. Conf. on Machine Learning 2003.
[5]
T. Gunarathne, B. Zhang, T.-L. Wu, and J. Qiu. "Scalable Parallel Computing on Clouds Using Twister4Azure Iterative MapReduce." Future Generation Computer Systems (29), pp. 1035--1048, 2013.
[6]
B. Zhang, J. Qiu. High Performance Clustering of Social Images in a Map-Collective Programming Model. Technical Report, July 2, 2013.

Cited By

View all
  • (2016)Model-centric computation abstractions in machine learning applicationsProceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/2926534.2926539(1-4)Online publication date: 26-Jun-2016
  • (2016)A Collective Communication Layer for the Software Stack of Big Data Analytics2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW)10.1109/IC2EW.2016.35(204-206)Online publication date: Apr-2016
  • (2015)HarpProceedings of the 2015 IEEE International Conference on Cloud Engineering10.1109/IC2E.2015.35(228-233)Online publication date: 9-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing
October 2013
427 pages
ISBN:9781450324281
DOI:10.1145/2523616
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2013

Check for updates

Author Tags

  1. collective communication
  2. data intensive
  3. high dimension
  4. iterative MapReduce
  5. social images

Qualifiers

  • Research-article

Conference

SOCC '13
Sponsor:
SOCC '13: ACM Symposium on Cloud Computing
October 1 - 3, 2013
California, Santa Clara

Acceptance Rates

SOCC '13 Paper Acceptance Rate 23 of 114 submissions, 20%;
Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Model-centric computation abstractions in machine learning applicationsProceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond10.1145/2926534.2926539(1-4)Online publication date: 26-Jun-2016
  • (2016)A Collective Communication Layer for the Software Stack of Big Data Analytics2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW)10.1109/IC2EW.2016.35(204-206)Online publication date: Apr-2016
  • (2015)HarpProceedings of the 2015 IEEE International Conference on Cloud Engineering10.1109/IC2E.2015.35(228-233)Online publication date: 9-Mar-2015
  • (2014)A Tale of Two Data-Intensive ParadigmsProceedings of the 2014 IEEE International Congress on Big Data10.1109/BigData.Congress.2014.137(645-652)Online publication date: 27-Jun-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media