Article

A new distributed data mining model based on similarity

Authors:
Tao Li

University of Rochester, Rochester, NY

University of Rochester, Rochester, NY
View Profile

,
Shenghuo Zhu

University of Rochester, Rochester, NY

University of Rochester, Rochester, NY
View Profile

,
Mitsunori Ogihara

University of Rochester, Rochester, NY

University of Rochester, Rochester, NY
View Profile

SAC '03: Proceedings of the 2003 ACM symposium on Applied computingMarch 2003Pages 432–436https://doi.org/10.1145/952532.952618

Published:09 March 2003Publication History

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

Pages 432–436

ABSTRACT

Distributed Data Mining (DDM) has been very active and enjoying a growing amount attention since its inception. Current DDM techniques regard the distributed data sets as a single virtual table and assume there exists a global model which could be generated if the data were combined/centralized. This paper proposes a similarity-based distributed data mining(SBDDM) framework which explicitly take the differences among distributed sources into consideration. A new similarity measure is introduced and its effectiveness is then evaluated and validated. This paper also illustrates the limitations of current DDM techniques through three concrete case studies. Finally distributed clustering within the SBDDM framework is also discussed.

References

Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., & Park, J. S. (1999). Fast algorithms for projected clustering. ACM SIGMOD Conference (pp. 61--72).]] Google ScholarDigital Library
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining associations between sets of items in massive databases. ACM-SIGMOD-1993 (pp. 207--216).]] Google ScholarDigital Library
Burdick, D., Calimlim, M., & Gehrke, J. (2001). MAFIA: A maximal frequent itemset algorithm for transactional databases. ICDE (pp. 443--452).]] Google ScholarDigital Library
Chan, P. C., & Stolfo, S. (1993). Meta-learning for multistrategy and parallel learning. Proceedings of the Second International Workshop on Multistrategy Learning.]]Google Scholar
Cheung, D. W., Ng, V. T., Fu, A. W., & Fu, Y. J. (1996). Efficient mining of association rules in distributed databases. IEEE Trans. On Knowledge and Data Engineering, 8, 911--922.]] Google ScholarDigital Library
Cho, V., & Wuthrich, B. (1998). Towards real time discovery from distributed information sources. PAKDD.]] Google ScholarDigital Library
Das, G., Gunopulos, D., & Mannila, H. (1997). Finding similar time series. Principles of Data Mining and Knowledge Discovery (pp. 88--100).]] Google ScholarDigital Library
Das, G., & Mannila, H. (2000). Context-based similarity methods for categorical attributes. PKDD (pp. 201--211).]] Google ScholarDigital Library
Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999). A framework for measuring changes in data characteristics. Proceedings of 18th Symposium on Principles of Database Systems (pp. 126--137). ACM Press.]] Google ScholarDigital Library
Gouda, K., & Zaki, M. J. (2001). Efficiently mining maximal frequent itemsets. ICDM.]] Google ScholarDigital Library
Kargupta, H., & Chan, P. (Eds.). (2000). Advances in distributed and parallel data mining. AAAI Press.]]Google Scholar
Kargupta, H., Park, B., Hershbereger, D., & Johnson, E. (2000). Collective data mining: A new perspective toward distributed data mining. In H. Kargupta and P. Chan (Eds.), Advances in distributed data mining, 133--184. AAAI/MIT.]]Google Scholar
Lam, W., & Segre, A. M. (1997). Distributed data mining of probabilistic knowledge. ICDCS.]] Google ScholarDigital Library
Li, T., Ogihara, M., & Zhu, S. (2002). Similarity testing between heterogeneous basket databases (Technical Report 781). Computer Science, Univ. of Rochester.]]Google Scholar
Parthasarathy, S., & Ogihara, M. (2000). Clustering distributed homogeneous datasets. PKDD.]] Google ScholarDigital Library
R. Wirth, M. B., & Hipp, J. (2001). When distribution is part of the semantics: A new problem class for distributed knowledge discovery. In Proceedings of workshop on Ubiquitous Data Mining for Mobile and Distributed Environments, PKDD/ECML 2001.]]Google Scholar
Rafiei, D., & Mendelzon, A. (1997). Similarity-based queries for time series data (pp. 13--25).]]Google Scholar
Ronkainen, R. (1998). Attribute similarity and event sequence similarity in data mining. Ph.lic.thesis, University of Helsinki. Available as Report C-1998-42, University of Helsinki, Department of Computer Science, October 1998.]]Google Scholar
Subramonian, R. (1998). Defining diff as a data mining primitive. KDD.]]Google Scholar
Turnisky, A., & Grossman, R. (2000). A framework for finding distributed data mining strategies that are intermediate between centralized strategies and in-place strategies. Proc. of KDD Workshop on Distributed Data Mining.]]Google Scholar
Yamanishi, K. (1997). Distributed cooperative bayesian learning strategies. Proceedings of COLT 97 (pp. 250--262). New York: ACM.]] Google ScholarDigital Library
Zaki, M., & Ho, C. (Eds.). (2000). Large-scale parallel data mining. Springer.]]Google Scholar
Zhu, S., Li, T., & Ogihara, M. (2002). CoFD: An algorithm for non-distance based clustering in high dimensional spaces. DaWaK.]] Google ScholarDigital Library

Recommendations

Data cloud for distributed data mining via pipelined mapreduce
ADMI'11: Proceedings of the 7th international conference on Agents and Data Mining Interaction

Distributed data mining (DDM) which often utilizes autonomous agents is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data. As datasets double in size every year, moving the data ...
Read More
CAKE - Classifying, Associating and Knowledge DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data Mining Agents (PADMAs)
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

This paper accentuate an approach of implementing Distributed Data Mining (DDM) using Multi-Agent System (MAS) technology, and proposes a data mining technique of “CAKE” (Classifying, Associating & Knowledge DiscovEry). The architecture is based on ...
Read More
Research of distributed data mining association rules model based on similarity
HCI'07: Proceedings of the 12th international conference on Human-computer interaction: applications and services

With the rapid development of social information, the application of distributed database system is increasing. Distributed data mining will play an important role in data mining, because distributed database system is different from centralized ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '03: Proceedings of the 2003 ACM symposium on Applied computing
March 2003
1268 pages
ISBN:1581136242
DOI:10.1145/952532
Conference Chair:
Gary B. Lamont
Air Force Institute of Technology
,
Program Chairs:
Hisham Haddad
Kennesaw State University
,
George A. Papadopoulos
University of Cyprus, Cyprus
,
Publications Chair:
Brajendra Panda
University of Arkansas
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 March 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed Data Mining (DDM)
SBDDM
similarity
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 1,082
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A new distributed data mining model based on similarity

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Recommendations

Data cloud for distributed data mining via pipelined mapreduce

CAKE - Classifying, Associating and Knowledge DiscovEry - An Approach for Distributed Data Mining (DDM) Using PArallel Data Mining Agents (PADMAs)

Research of distributed data mining association rules model based on similarity