Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

Dua, Sumeet; Dessauer, Michael P.; Sethi, Prerna

doi:10.1007/s10916-010-9512-1

Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

Original Paper
Published: 09 May 2010

Volume 35, pages 845–853, (2011)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Sumeet Dua¹,
Michael P. Dessauer¹ &
Prerna Sethi²

128 Accesses
1 Citation
Explore all metrics

Abstract

Medical sciences are rapidly emerging as a data rich discipline where the amount of databases and their dimensionality increases exponentially with time. Data integration algorithms often rely upon discovering embedded, useful, and novel relationships between feature attributes that describe the data. Such algorithms require data integration prior to knowledge discovery, which can lack the timeliness, scalability, robustness, and reliability of discovered knowledge. Knowledge integration algorithms offer pattern discovery on segmented and distributed databases but require sophisticated methods for pattern merging and evaluating integration quality. We propose a unique computational framework for discovering and integrating frequent sets of features from distributed databases and then exploiting them for unsupervised learning from the integrated space. Assorted indices of cluster quality are used to assess the accuracy of knowledge merging. The approach preserves significant cluster quality under various cluster distributions and noise conditions. Exhaustive experimentation is performed to further evaluate the scalability and robustness of the proposed methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local support-based partition algorithm for frequent pattern mining

Article 05 October 2018

Vijayakumar Kadappa & Shivaraju Nagesh

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Data Mining Paradigms

References

Deeray, T., and Verhayden, P. Towards a semantic integration of medical relational databases by using ontologies: a case study. On the Move to Meaningful Internet System 2003 Workshop (OTM ’03), Lecture Notes in Computer Sciences 2889, pp. 137–150, 2003
Hadzic, M., and Chang, E., Onto-agent methodology for design of ontology-based mufti-agent systems. Int. J. Comput. Syst. Sci. Eng. 23:19–30, 2008.
Google Scholar
Batini, C., Lenzerini, M., and Navathe, S. B., A comparative analysis of methodologies for database schema integration. ACM Comput. Surv. 18:323–364, 1986.
Article Google Scholar
Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.), Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, 1991.
Google Scholar
Goethals, B., Survey on Frequent Pattern Mining. Available at http://www.cs.columbia.edu/∼jebara/6772/papers/SurveyFPMining.pdf, 2003.
Dua, S., Jain, V., and Thompson, H. W., Patient classification using association mining of clinical images. Biomedical Imaging: From Nano to Macro, 2008. ISBI 2008. 5th IEEE International Symposium on, pp.253–256, 14–17 May 2008.
Zaki, M. J., Parthsarathy, S., Ogihara, M., and Li, W., New algorithms for fast discovery of association rules. KDD, pp. 283–286, 1997.
Lent, B., Swami, A., and Widom, J., Clustering association rules. Proc. 1997 Int’l Conf. Data Eng., pp. 220–231, Apr. 1997.
Agrawal, R., and Srikant, R., Fast algorithms for mining association rules in large databases. VLDB ’94: Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., pp. 487–499, 1994.
Sethi, P., and Jain M., A comparative feature selection approach for the prediction of healthcare coverage. Communications in Computer and Information Science, to appear 2010.
Delen, D., Fuller, C., McCann, C., and Ray, D., Analysis of healthcare coverage: a data mining approach. Exp. Syst. Appl. 36:995–1003, 2009.
Article Google Scholar
Dua, S., Singh, H., and Thompson, H. W., Associative classification of mammograms using weighted rules. Exp. Syst. Appl. 36(5):9250–9259, 2009.
Article Google Scholar
Han, J., Pei, H., and Yin, Y., Mining frequent patterns without candidate generation. In: Proc. conf. on the Management of Data (SIGMOD’00, Dallas, TX). ACM Press, New York, 2000.
Sethi, P., and Leangsuksun, C., A novel computational framework for fast distributed computing and knowledge integration for microarray gene expression data analysis. Advanced Information Networking and Applications, International Conference on, pp. 613–617, 20th International Conference on Advanced Information Networking and Applications - Volume 2 (AINA’06), 2006.
Rand, W. M., Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66:846–850, 1971.
Article Google Scholar
Hubert, L., and Arabie, P., Comparing partitions. J. Classif. 193–218, 1985.

Download references

Author information

Authors and Affiliations

Data Mining Research Laboratory, Department of Computer Science, Louisiana Tech University, Ruston, LA, 71272, USA
Sumeet Dua & Michael P. Dessauer
Department of Health Informatics and Information Management, Louisiana Tech University, Ruston, LA, 71272, USA
Prerna Sethi

Authors

Sumeet Dua
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. Dessauer
View author publications
You can also search for this author in PubMed Google Scholar
Prerna Sethi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumeet Dua.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dua, S., Dessauer, M.P. & Sethi, P. Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases. J Med Syst 35, 845–853 (2011). https://doi.org/10.1007/s10916-010-9512-1

Download citation

Received: 03 March 2010
Accepted: 16 March 2010
Published: 09 May 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10916-010-9512-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

Abstract

Access this article

Similar content being viewed by others

Local support-based partition algorithm for frequent pattern mining

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Data Mining Paradigms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

Abstract

Access this article

Similar content being viewed by others

Local support-based partition algorithm for frequent pattern mining

SS-FIM: Single Scan for Frequent Itemsets Mining in Transactional Databases

Data Mining Paradigms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation