Matching samples of multiple views

Tripathi, Abhishek; Klami, Arto; Orešič, Matej; Kaski, Samuel

doi:10.1007/s10618-010-0205-7

Matching samples of multiple views

Published: 28 November 2010

Volume 23, pages 300–321, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abhishek Tripathi¹,
Arto Klami²,
Matej Orešič³ &
…
Samuel Kaski⁴

384 Accesses
7 Citations
Explore all metrics

Abstract

Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bach FR, Jordan MI (2005) A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, Berkeley
Barzilay R, Elhadad N (2003) Sentence alignment for monolingual comparable corpora. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 25–32
Bickel S, Scheffer T (2005) Estimation of mixture models using Co-EM. In: Proceedings of the European conference on machine learning, Lecture Notes in Computer Science, vol 3720/2005. Springer, Berlin, Heidelberg, pp 35–46. doi:10.1007/11564096
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, NY, USA, pp 127–134
Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia
Book MATH Google Scholar
Damian D, Orešič M, Verheij E, Meulman J, Friedman J, Adourian A, Morel N, Smilde A, Greef J (2007) Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics 3: 69–77
Article Google Scholar
Duff IS, Koster J (2001) On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J Matrix Anal Appl 22(4): 973–996. doi:10.1137/S0895479899358443
Article MATH MathSciNet Google Scholar
Farquhar JDR, Hardoon DR, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 355–362
Google Scholar
Gretton A, Herbrich R, Smola A (2003) The kernel mutual information. In: Proceedings of ICASSP’03, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp IV-880–IV-883
Haghighi A, Liang P, Berh-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, pp 771–779
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12): 2639–2664
Article MATH Google Scholar
Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4): 325–340. doi:10.1007/BF02278710
Article MATH MathSciNet Google Scholar
Klami A, Kaski S (2005) Non-parametric dependent components. In: Proceedings of ICASSP’05, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp V-209–V-212
Klami A, Kaski S (2008) Probabilistic approach to detecting dependencies between data sets. Neurocomputing 72(1–3): 39–46
Article Google Scholar
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Quart 2(1–2): 83–97
Article MathSciNet Google Scholar
Li Y, Shawe-Taylor J (2006) Using KCCA for Japanese-English cross-language information retrieval and document classification. J Intel Inf Syst 27(2): 117–133. doi:10.1007/s10844-006-1627-y
Article Google Scholar
Melamed D (1999) Bitext maps and alignment via pattern recognition. Comput Linguist 25(1): 107–130
Google Scholar
Nikkilä J, Sysi-Aho M, Ermolov A, Seppänen-Laakso T, Simell O, Kaski S, Orešič M (2008) Gender dependent progression of systemic metabolic states in early childhood. Mole Syst Biol 4: 197. doi:10.1038/msb.2008.34
Google Scholar
Orešič M, Hänninen V, Vidal-Puig A (2008) Lipidomics: a new window to biomedical frontiers. Trends Biotechnol 26(12): 647–652. doi:10.1016/j.tibtech.2008.09.001
Article Google Scholar
Orešič M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppänen-Laakso T, Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, Reinikainen A, Lähde J, Suortti T, Hakalax J, Simell T, Hyöty H, Veijola R, Ilonen J, Lahesmaa R, Knip M, Simell O (2008) Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med 205(13): 2975–2984
Article Google Scholar
Quadrianto N, Song L, Smola A (2009) Kernelized sorting. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 1289–1296
Rogers S, Girolami M, Kolch W, Waters KM, Liu T, Thrall B, Wiley HS (2008) Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics 24(24): 2894–2900. doi:10.1093/bioinformatics/btn553
Article Google Scholar
Rogers S, Klami A, Sinkkonen J, Girolami M, Kaski S (2010) Infinite factorization of multiple non-parametric views. Mach Learn 79(1-2): 201–226. doi:10.1007/s10994-009-5155-1
Article Google Scholar
Smola AJ, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Takimoto E (ed) Algorithmic learning theory, Lecture Notes on Computer Science, invited paper. Springer, Berlin, Heidelberg, pp 13–31
Tripathi A, Klami A, Kaski S (2008) Using dependencies to pair samples for multi-view learning. TKK reports in information and computer science TKK-ICS-R8, Helsinki University of Technology, Espoo, Finland
Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249
Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE international workshop on machine learning for signal processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249
Vinokourov A, Hardoon DR, Shawe-taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: In proceedings of fourth international symposium on independent component analysis and blind source separation
Wang C, Mahadevan S (2008) Manifold alignment using Procrustes analysis. In: Proceedings of the 25th international conference on machine learning, pp 1120–1127
Wang C, Mahadevan S (2009) Manifold alignment without correspondence. In: IJCAI’09: Proceedings of the 21st international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1273–1278

Download references

Author information

Authors and Affiliations

Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
Abhishek Tripathi
Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Helsinki, Finland
Arto Klami
Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland
Matej Orešič
Helsinki Institute for Information Technology HIIT, Aalto University and University of Helsinki, Helsinki, Finland
Samuel Kaski

Authors

Abhishek Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Arto Klami
View author publications
You can also search for this author in PubMed Google Scholar
Matej Orešič
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Kaski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhishek Tripathi.

Additional information

Responsible editor: Johannes Gehrke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, A., Klami, A., Orešič, M. et al. Matching samples of multiple views. Data Min Knowl Disc 23, 300–321 (2011). https://doi.org/10.1007/s10618-010-0205-7

Download citation

Received: 23 July 2009
Accepted: 28 October 2010
Published: 28 November 2010
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10618-010-0205-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Matching samples of multiple views

Abstract

Access this article

Similar content being viewed by others

Multi-View Data Completion

Co-clustering based classification of multi-view data

Warped Matrix Factorisation for Multi-view Data Integration

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Matching samples of multiple views

Abstract

Access this article

Similar content being viewed by others

Multi-View Data Completion

Co-clustering based classification of multi-view data

Warped Matrix Factorisation for Multi-view Data Integration

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation