skip to main content
10.1145/1980022.1980131acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicwetConference Proceedingsconference-collections
research-article

Clustering with Apache Hadoop

Published: 25 February 2011 Publication History

Abstract

The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Thus, SOM is an excellent tool in the exploratory phase of data mining. Self-organizing maps have been successfully applied to many fields, including engineering and business domains. Experimental results on census database illustrate the results of clustering.
The paper proposes to improve the performance of clustering by the latest approach of cloud computing.
The approach focuses on Hadoop that provides a Java-based software framework to distribute processing over a cluster of processors by providing a open source implementation of MapReduce, a powerful tool designed for the detailed analysis and transformation of very large data sets.

References

[1]
J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Mateo, CA: Morgan Kauffmann, 2001.
[2]
C. J. Merz and P. Murphy (1996) UCI Repository of ML Databases. {Online}. Available: http://www.cs.uci.edu/~mlearn/MLRepository.html
[3]
D. R. Chen, R. F. Chang and Y. L. Huang," Breast Cancer diagnosis using self-organizing maps for sonography," Ultrasound Med. Boil., vol. 1, no 26, pp. 4411, 2000
[4]
J. Vesanto, E. Alhoniemi, J. Himberg, K. Kiviluoto and J. Parviainen, "Self-organizing map for data mining in Matlab: The SOM Toolbox," Simulation News Europe, vol 25, no. 54, 1999
[5]
T. Kohonen, "The self-organizing map," Proc IEEE, vol. 78, no 9, pp 1464--1480, Sep 1990.
[6]
Juha Vesanto, Esa Alhoniemi, "Clustering of the Self-Organizing Map", vol. 11, no 3, May 2000
[7]
S. Kaski, J. Sinkkonen, and J. Peltonen, "Bankruptcy analysis with self-organizing maps in learning metrics," IEEE Trans. Neural Netw., vol 12, no. 4, pp. 936--947, Jul 2001.
[8]
Chung-Chian Hsu, "Generalizing Self-Organizing Map for Categorical Data", IEEE Trans. Neural Netw., vol. 17, no. 2, March 2006.
[9]
"Hadoop: The Definitive Guide, by Tom White. Copyright 2009 Tom White, 978-0-596-52197-4."

Cited By

View all
  • (2018)Uncertainty-Based Clustering Algorithms for Large Data SetsModern Technologies for Big Data Classification and Clustering10.4018/978-1-5225-2805-0.ch001(1-33)Online publication date: 2018
  • (2016)Big Data Research Analysis Based on BibliometricsHans Journal of Data Mining10.12677/HJDM.2016.6301506:03(125-137)Online publication date: 2016
  • (2015)MapReduce-based fuzzy c-means clustering algorithm: implementation and scalabilityInternational Journal of Machine Learning and Cybernetics10.1007/s13042-015-0367-06:6(923-934)Online publication date: 29-Apr-2015
  • Show More Cited By

Index Terms

  1. Clustering with Apache Hadoop

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology
    February 2011
    1385 pages
    ISBN:9781450304498
    DOI:10.1145/1980022
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Thakur College Of Engg. & Tech: Thakur College Of Engineering & Technology

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cloud computing
    2. cluster analysis
    3. data mining
    4. hadoop
    5. map reduce
    6. self-organizing maps
    7. virtualization

    Qualifiers

    • Research-article

    Conference

    ICWET '11
    Sponsor:
    • Thakur College Of Engg. & Tech

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Uncertainty-Based Clustering Algorithms for Large Data SetsModern Technologies for Big Data Classification and Clustering10.4018/978-1-5225-2805-0.ch001(1-33)Online publication date: 2018
    • (2016)Big Data Research Analysis Based on BibliometricsHans Journal of Data Mining10.12677/HJDM.2016.6301506:03(125-137)Online publication date: 2016
    • (2015)MapReduce-based fuzzy c-means clustering algorithm: implementation and scalabilityInternational Journal of Machine Learning and Cybernetics10.1007/s13042-015-0367-06:6(923-934)Online publication date: 29-Apr-2015
    • (2014)Parallel glowworm swarm optimization clustering algorithm based on MapReduce2014 IEEE Symposium on Swarm Intelligence10.1109/SIS.2014.7011794(1-8)Online publication date: Dec-2014
    • (2013)Data Management for Internet of ThingsProceedings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing10.1109/GreenCom-iThings-CPSCom.2013.199(1144-1151)Online publication date: 20-Aug-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media