Abstract
A zero-one high-dimensional data set is said to be banded if all the dimensions can be reorganised such that the “non zero” entries are arranged along the leading diagonal across the dimensions. Our goal is to develop effective algorithms that identify banded patterns in multi-dimensional zero-one data by automatically rearranging the ordering of all the dimensions. Rearranging zero-one data so as to feature “bandedness” allows for the identification of hidden information and enhances the operation of many data mining algorithms (and other algorithms) that work with zero-one data. In this paper two N-Dimensional Banded Pattern Mining (NDBPM) algorithms are presented. The first is an approximate algorithm (NDBPM\(_{APPROX}\)) and the second an exact algorithm (NDBPM\(_{EXACT}\)). Two variations of NDBPM\(_{EXACT}\) are presented (Euclidean and Manhattan). Both algorithms are fully described together with evaluations of their operation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The Total From Partial (TFP) algorithm [8] was used for this purpose, but any alternative FIM algorithm could equally well have been used.
- 3.
References
Abdullahi, F.B., Coenen, F., Martin, R.: A scalable algorithm for banded pattern mining in multi-dimensional zero-one data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 345–356. Springer, Heidelberg (2014)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD 1993, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings 20th International Conference on Very Large Data Bases (VLDB 1994), pp. 487–499 (1994)
Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: a combinatorial problem in molecular biology. Algorithmica 13, 52–76 (1995)
Baeza-Yates, R., RibeiroNeto, B.: Modern Information Retrieval. Addison-Wesley, Wokingham (1999)
Blake, C.I., Merz, C.J.: UCI repository of machine learning databases (1998). http://www.ics.uci.edu/mlearn/MLRepository.htm
Cheng, K.Y.: Minimising the bandwidth of sparse symmetric matrices. Computing 11, 103–110 (1973)
Coenen, F., Goulbourne, G., Leng, P.: Computing association rules using partial totals. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 54–66. Springer, Heidelberg (2001)
Cuthill, A.E., McKee, J.: Reducing bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 29th ACM National Conference, pp. 157–172 (1969)
Fortelius, M., Kai Puolamaki, M.F., Mannila, H.: Seriation in paleontological data using Markov Chain Monte method. PLoS Comput. Biol. 2, e6 (2006)
Gemma, G.C., Junttila, E., Mannila, H.: Banded structures in binary matrices. Knowl. Discov. Inf. Syst. 28, 197–226 (2011)
Green, D.M., Kao, R.R.: Data quality of the Cattle Tracing System in great Britain. Vet. Rec. 161, 439–443 (2007)
Junttila, E.: Pattern in Permuted Binary Matrices. PhD thesis (2011)
Von Luxburg, U.A.: A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007)
Makinen, E., Siirtola, H.: The barycenter heuristic and the reorderable matrix. Informatica 29, 357–363 (2005)
Mannila, H., Terzi, E.: Nestedness and segmented nestedness. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2007, New York, NY, USA, pp. 480–489 (2007)
Mueller, C.: Sparse matrix reordering algorithms for cluster identification. Mach. Learn. Bioinform. 1532 (2004)
Papadimitrious, C.H.: The NP-completeness of the bandwidth minimisation problem. Computing 16, 263–270 (1976)
Nohuddin, P.N.E., Christley, R., Coenen, F., Setzkorn, C.: Trend mining in social networks: a study using a large cattle movement database. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 464–475. Springer, Heidelberg (2010)
Robinson, S., Christley, R.M.: Identifying temporal variation in reported birth, death and movements of cattle in Britain. BMC Vet. Res. 2, 11 (2006)
Rosen, R.: Matrix bandwidth minimisation. In: ACM National Conference Proceedings, pp. 585–595 (1968)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Abdullahi, F.B., Coenen, F., Martin, R. (2016). Banded Pattern Mining Algorithms in Multi-dimensional Zero-One Data. In: Hameurlain, A., Küng, J., Wagner, R., Bellatreche, L., Mohania, M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVI. Lecture Notes in Computer Science(), vol 9670. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49784-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-49784-5_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49783-8
Online ISBN: 978-3-662-49784-5
eBook Packages: Computer ScienceComputer Science (R0)