Abstract
Zero-one data is frequently encountered in the field of data mining. A banded pattern in zero-one data is one where the attributes (columns) and records (rows) are organized in such a way that the “ones” are arranged along the leading diagonal. The significance is that rearranging zero-one data so as to feature bandedness enhances the operation of some data mining algorithms that work with zero-one data. The fact that a dataset features banding may also be of interest in its own right with respect to various application domains. In this paper an effective banding algorithm is presented designed to reveal banding in 2D data by rearranging the ordering of columns and rows. The challenge is the large number of potential row and column permutations. To address this issue a column and row scoring mechanism is proposed that allows columns and rows to be ordered so as to reveal bandedness without the need to consider large numbers of permutations. This mechanism has been incorporated into the Banded Pattern Mining (BPM) algorithm proposed in this paper. The operation of BPM is fully discussed. A Complete evaluation of the BPM algorithm is also presented clearly indicating the advantages offered by BPM with respect to a number of competitor algorithms in the context of a collection of UCI Datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings 20th International Conference on Very Large Data Bases (VLDB 1994), pp. 487–499 (1994)
Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: A combinatorial problem in molecular biology. Algorithmica 13, 52–76 (1995)
Atkins, J., Boman, E., Hendrickson, B.: Spectral algorithm for seriation and the consecutive ones problem. SIAM J. Comput. 28, 297–310 (1999)
Baeza-Yates, R., RibeiroNeto, B.: Modern Information Retrieval. Addison-Wesley (1999)
Blake, C.L., Merz, C.J.: Uci repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.htm
Brower, J.C., Kile, K.M.: Seriation of an original data matrix as applied to paleoecology. Lethaia 21, 79–93 (1988)
Coenen, F., Goulbourne, G., Leng, P.: Computing association rules using partial totals. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 54–66. Springer, Heidelberg (2001)
Cuthill, A.E., McKee, J.: Reducing bandwidth of sparse symmentric matrices. In: Proceedings of the 1969 29th ACM national Conference, pp. 157–172 (1969)
Fortelius, M., Puolamaki, M.F.K., Mannila, H.: Seriation in paleontological data using markov chain monte method. PLoS Computational Biology, 2 (2006)
Garriga, G.C., Junttila, E., Mannila, H.: Banded structures in binary matrices. Knowledge Discovery and Information System 28, 197–226 (2011)
Pio, G., Ceci, M., D’Elia, D., Loglisci, C., Malerba, D.: A novel biclustering algorithm for the discovery of meaningful boiological correlaions between micrornas and their target genes. BMC Bioiinformatics 14, 8 (2013)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Junttila, E.: Pattern in Permuted Binary Matrices. Ph.D. thesis (2011)
Luxburg, U.V.: A tutorial on spectral clustering. Statistical Computation 17, 395–416 (2007)
Deodhar, M., Gupta, G., Ghosh, J., Cho, H., Dhillon, I.: A scalable framework for discovering coherent co-clusters in noisy data. In: Proceedings of 26th International Conference on Machine Learning (ICML), Montreal Canada, p. 31 (2009)
Mueller, C.: Sparse matrix reordering algorithms for cluster identification. Machune Learning in Bioinformatics (2004)
Myllykangas, S., Himberg, J., Bohling, T., Nagy, B., Hollman, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene, 7324–7332 (2006)
Mäkinen, E., Siirtola, H.: The barycenter heuristic and the reorderable matrix. Informatica 29, 357–363 (2005)
Sugiyama, K., Tagawa, S., Toda, M.: Methods for visual understanding of hierarchical system structures. IEEE Transactions on Systems, Man and Cybernetics 11, 109–125 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Abdullahi, F.B., Coenen, F., Martin, R. (2014). A Novel Approach for Identifying Banded Patterns in Zero-One Data Using Column and Row Banding Scores. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-08979-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08978-2
Online ISBN: 978-3-319-08979-9
eBook Packages: Computer ScienceComputer Science (R0)