Explaining Mixture Models through Semantic Pattern Mining and Banded Matrix Visualization

Adhikari, Prem Raj; Vavpetič, Anže; Kralj, Jan; Lavrač, Nada; Hollmén, Jaakko

doi:10.1007/978-3-319-11812-3_1

Explaining Mixture Models through Semantic Pattern Mining and Banded Matrix Visualization

Prem Raj Adhikari²¹,
Anže Vavpetič²²,
Jan Kralj²²,
Nada Lavrač²² &
…
Jaakko Hollmén²¹

Conference paper

1959 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8777))

Abstract

Semi-automated data analysis is possible for the end user if data analysis processes are supported by easily accessible tools and methodologies for pattern/model construction, explanation, and exploration. The proposed three–part methodology for multiresolution 0–1 data analysis consists of data clustering with mixture models, extraction of rules from clusters, as well as data, cluster, and rule visualization using banded matrices. The results of the three-part process—clusters, rules from clusters, and banded structure of the data matrix—are finally merged in a unified visual banded matrix display. The incorporation of multiresolution data is enabled by the supporting ontology, describing the relationships between the different resolutions, which is used as background knowledge in the semantic pattern mining process of descriptive rule induction. The presented experimental use case highlights the usefulness of the proposed methodology for analyzing complex DNA copy number amplification data, studied in previous research, for which we provide new insights in terms of induced semantic patterns and cluster/pattern visualization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, C.-H., Hwu, H.-G., Jang, W.-J., Kao, C.-H., Tien, Y.-J., Tzeng, S., Wu, H.-M.: Matrix Visualization and Information Mining. In: Antoch, J. (ed.) COMPSTAT 2004 – Proceedings in Computational Statistics, pp. 85–100. Physica-Verlag HD (2004)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Durkin, S.G., Glover, T.W.: Chromosome Fragile Sites. Annual Review of Genetics 41(1), 169–192 (2007)
Article Google Scholar
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., Stratton, M.R.: A census of human cancer genes. Nature Reviews. Cancer 4(3), 177–183 (2004)
Article Google Scholar
Garriga, G.C., Junttila, E., Mannila, H.: Banded structure in binary matrices. Knowledge and Information Systems 28(1), 197–226 (2011)
Article Google Scholar
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. Adaptive Computation and Machine Learning Series. MIT Press (2001)
Google Scholar
Hollmén, J., Seppänen, J.K., Mannila, H.: Mixture models and frequent sets: combining global and local methods for 0-1 data. In: Proceedings of the Third SIAM International Conference on Data Mining, pp. 289–293. Society of Industrial and Applied Mathematics (2003)
Google Scholar
Hollmén, J., Tikka, J.: Compact and understandable descriptions of mixtures of Bernoulli distributions. In: Berthold, M.R., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 1–12. Springer, Heidelberg (2007)
Chapter Google Scholar
Langohr, L., Podpecan, V., Petek, M., Mozetic, I., Gruden, K., Lavrač, N., Toivonen, H.: Contrasting Subgroup Discovery. The Computer Journal 56(3), 289–303 (2013)
Article Google Scholar
Lockwood, W.W., Chari, R., Coe, B.P., Girard, L., Macaulay, C., Lam, S., Gazdar, A.F., Minna, J.D., Lam, W.L.: DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene 27(33), 4615–4624 (2008)
Article Google Scholar
McLachlan, G.J., Peel, D.: Finite mixture models. Probability and Statistics – Applied Probability and Statistics, vol. 299. Wiley (2000)
Google Scholar
Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)
Article Google Scholar
Myllykangas, S., Tikka, J., Böhling, T., Knuutila, S., Hollmén, J.: Classification of human cancers based on DNA copy number amplification modeling. BMC Medical Genomics 1(15) (May 2008)
Google Scholar
Novak, P., Lavrač, N., Webb, G.I.: Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining. Journal of Machine Learning Research 10, 377–403 (2009)
MATH Google Scholar
Shaffer, L.G., Tommerup, N.: ISCN 2005: An Intl. System for Human Cytogenetic Nomenclature (2005) Recommendations of the Intl. Standing Committee on Human Cytogenetic Nomenclature. Karger (2005)
Google Scholar
Tikka, J., Hollmén, J., Myllykangas, S.: Mixture Modeling of DNA copy number amplification patterns in cancer. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 972–979. Springer, Heidelberg (2007)
Chapter Google Scholar
Trajkovski, I., Železný, F., Lavrač, N., Tolar, J.: Learning Relational Descriptions of Differentially Expressed Gene Groups. IEEE Transactions on Systems, Man, and Cybernetics, Part C 38(1), 16–25 (2008)
Article Google Scholar
Vavpetič, A., Lavrač, N.: Semantic Subgroup Discovery Systems and Workflows in the SDM-Toolkit. The Comput. J. 56(3), 304–320 (2013)
Article Google Scholar
Vavpetič, A., Novak, P.K., Grčar, M., Mozetič, I., Lavrač, N.: Semantic Data Mining of Financial News Articles. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 294–307. Springer, Heidelberg (2013)
Chapter Google Scholar
Vavpetič, A., Podpečan, V., Lavrač, N.: Semantic subgroup explanations. Journal of Intelligent Information Systems (2013) (in press)
Google Scholar
zur Hausen, H.: The search for infectious causes of human cancers: Where and why. Virology 392(1), 1–10 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT and Department of Information and Computer Science, Aalto University School of Science, PO Box 15400, FI-00076, Aalto, Espoo, Finland
Prem Raj Adhikari & Jaakko Hollmén
Jožef Stefan Institute and Jožef Stefan International Postgraduate School, Jamova 39, 1000, Ljubljana, Slovenia
Anže Vavpetič, Jan Kralj & Nada Lavrač

Authors

Prem Raj Adhikari
View author publications
You can also search for this author in PubMed Google Scholar
Anže Vavpetič
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kralj
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Jaakko Hollmén
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000, Ljubljana, Slovenia
Sašo Džeroski , Panče Panov & Dragi Kocev , &
Faculty of Administration, University of Ljubljana, Gosarjeva 5, 1000, Ljubljana, Slovenia
Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adhikari, P.R., Vavpetič, A., Kralj, J., Lavrač, N., Hollmén, J. (2014). Explaining Mixture Models through Semantic Pattern Mining and Banded Matrix Visualization. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds) Discovery Science. DS 2014. Lecture Notes in Computer Science(), vol 8777. Springer, Cham. https://doi.org/10.1007/978-3-319-11812-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-11812-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11811-6
Online ISBN: 978-3-319-11812-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics