Abstract
Many large organizations have multiple large databases as they transact from multiple branches. Many important decisions are based on a set of specific items called the select items. Thus, the analysis of select items in multiple databases is an important issue. For the purpose of studying select items in multiple databases, one might need true global patterns of select items. Thus, we propose a model of mining global patterns of select items from multiple databases. A measure of overall association between two items in a database is proposed. We have extended the proposed measure for a database whose transactions contain items along with the quantities purchased. We have designed an algorithm based on proposed measure for the purpose of grouping the frequent items in multiple databases. In addition, we have studied properties of different measures proposed in this paper. Experimental results are presented for both real and synthetic databases.
Similar content being viewed by others
References
Adhikari A, Rao PR, Adhikari J (2007) Mining multiple large databases, In: Proceedings of 10th international conference on information technology, pp 80–84
Adhikari A, Rao PR (2008) Efficient clustering of databases induced by local patterns. Decis Support Syst 44(4): 925–943
Adhikari A, Rao PR (2008) Association rules induced by item and quantity purchased. In: Proceedings of international conference on database systems for advance applications, LNCS 4947, pp 478–485
Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the 17th symposium on principles of database systems, pp 18–24
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD conference on management of data, pp 207–216
Agrawal R, Shafer J (1999) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6): 962–969
Anagnostopoulos A, Broder A, Punera K (2008) Effective and efficient classification on a search-engine model. Knowl Inf Syst 16(2): 129–154
Barte RG (1976) The elements of real analysis, 2nd edn. Wiley, New York
Denton AM, Besemann CA, Dorr DH (2009) Pattern-based time-series subsequence clustering using radial distribution functions. Knowl Inf Syst 18(1): 129–154
Estivill-Castro V, Yang J (2004) Fast and robust general purpose clustering algorithms. Data Min Knowl Discov 8(2): 127–150
Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data
Kandylas V, Upham , Ungar LH (2008) Finding cohesive clusters for analyzing knowledge communities. Knowl Inf Syst 17(3): 335–354
Klemettinen M, Mannila H, Ronkainen P, Toivonen T, Verkamo A (1994) Finding interesting rules from large sets of discovered association rules. In: Proceedings of the 3rd international conference on information and knowledge management, pp 401–407
Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15
Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: Proceedings of the 5th international conference on knowledge discovery and data mining, pp 125–134
Pyle D (1999) Data preparation for data mining. Morgan Kufmann, San Francisco
Ramkumar T, Srinivasan R (2008) Modified algorithms for synthesizing high-frequency rules from different data sources. Knowl Inf Syst 17(3): 313–334
Silberschatz A, Tuzhilin A (1996) What makes patterns interesting in knowledge discovery systems. IEEE Trans Knowl Data Eng 8(6): 970–974
Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: generalizing association rules to dependence rules. Data Min Knowl Discov 2(1): 39–68
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of SIGKDD conference, pp 32–41
Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowl Data Eng 14(2): 353–367
Wu X, Zhang C, Zhang S (2005) Database classification for multi-database mining. Inf Syst 30(1): 71–88
Zhang C, Liu M, Nie W, Zhang S (2004) Identifying global exceptional patterns in multi-database mining. IEEE Comput Intell Bull 3(1): 19–24
Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2(1): 5–13
Zhang S, Zhang C, Wu X (2004) Knowledge discovery in multiple databases. Springer, London
Zhang T, Ramakrishnan R, Livny M (1997) BIRCH: a new data clustering algorithm and its applications. Data Min Knowl Discov 1(2): 141–182
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Adhikari, A., Ramachandrarao, P. & Pedrycz, W. Study of select items in different data sources by grouping. Knowl Inf Syst 27, 23–43 (2011). https://doi.org/10.1007/s10115-010-0290-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0290-3