Skip to main content
Log in

Matching samples of multiple views

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Bach FR, Jordan MI (2005) A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, Berkeley

  • Barzilay R, Elhadad N (2003) Sentence alignment for monolingual comparable corpora. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 25–32

  • Bickel S, Scheffer T (2005) Estimation of mixture models using Co-EM. In: Proceedings of the European conference on machine learning, Lecture Notes in Computer Science, vol 3720/2005. Springer, Berlin, Heidelberg, pp 35–46. doi:10.1007/11564096

  • Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, NY, USA, pp 127–134

  • Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  • Damian D, Orešič M, Verheij E, Meulman J, Friedman J, Adourian A, Morel N, Smilde A, Greef J (2007) Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics 3: 69–77

    Article  Google Scholar 

  • Duff IS, Koster J (2001) On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J Matrix Anal Appl 22(4): 973–996. doi:10.1137/S0895479899358443

    Article  MATH  MathSciNet  Google Scholar 

  • Farquhar JDR, Hardoon DR, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 355–362

    Google Scholar 

  • Gretton A, Herbrich R, Smola A (2003) The kernel mutual information. In: Proceedings of ICASSP’03, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp IV-880–IV-883

  • Haghighi A, Liang P, Berh-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, pp 771–779

  • Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12): 2639–2664

    Article  MATH  Google Scholar 

  • Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4): 325–340. doi:10.1007/BF02278710

    Article  MATH  MathSciNet  Google Scholar 

  • Klami A, Kaski S (2005) Non-parametric dependent components. In: Proceedings of ICASSP’05, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp V-209–V-212

  • Klami A, Kaski S (2008) Probabilistic approach to detecting dependencies between data sets. Neurocomputing 72(1–3): 39–46

    Article  Google Scholar 

  • Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Quart 2(1–2): 83–97

    Article  MathSciNet  Google Scholar 

  • Li Y, Shawe-Taylor J (2006) Using KCCA for Japanese-English cross-language information retrieval and document classification. J Intel Inf Syst 27(2): 117–133. doi:10.1007/s10844-006-1627-y

    Article  Google Scholar 

  • Melamed D (1999) Bitext maps and alignment via pattern recognition. Comput Linguist 25(1): 107–130

    Google Scholar 

  • Nikkilä J, Sysi-Aho M, Ermolov A, Seppänen-Laakso T, Simell O, Kaski S, Orešič M (2008) Gender dependent progression of systemic metabolic states in early childhood. Mole Syst Biol 4: 197. doi:10.1038/msb.2008.34

    Google Scholar 

  • Orešič M, Hänninen V, Vidal-Puig A (2008) Lipidomics: a new window to biomedical frontiers. Trends Biotechnol 26(12): 647–652. doi:10.1016/j.tibtech.2008.09.001

    Article  Google Scholar 

  • Orešič M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppänen-Laakso T, Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, Reinikainen A, Lähde J, Suortti T, Hakalax J, Simell T, Hyöty H, Veijola R, Ilonen J, Lahesmaa R, Knip M, Simell O (2008) Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med 205(13): 2975–2984

    Article  Google Scholar 

  • Quadrianto N, Song L, Smola A (2009) Kernelized sorting. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 1289–1296

  • Rogers S, Girolami M, Kolch W, Waters KM, Liu T, Thrall B, Wiley HS (2008) Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics 24(24): 2894–2900. doi:10.1093/bioinformatics/btn553

    Article  Google Scholar 

  • Rogers S, Klami A, Sinkkonen J, Girolami M, Kaski S (2010) Infinite factorization of multiple non-parametric views. Mach Learn 79(1-2): 201–226. doi:10.1007/s10994-009-5155-1

    Article  Google Scholar 

  • Smola AJ, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Takimoto E (ed) Algorithmic learning theory, Lecture Notes on Computer Science, invited paper. Springer, Berlin, Heidelberg, pp 13–31

  • Tripathi A, Klami A, Kaski S (2008) Using dependencies to pair samples for multi-view learning. TKK reports in information and computer science TKK-ICS-R8, Helsinki University of Technology, Espoo, Finland

  • Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249

  • Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE international workshop on machine learning for signal processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249

  • Vinokourov A, Hardoon DR, Shawe-taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: In proceedings of fourth international symposium on independent component analysis and blind source separation

  • Wang C, Mahadevan S (2008) Manifold alignment using Procrustes analysis. In: Proceedings of the 25th international conference on machine learning, pp 1120–1127

  • Wang C, Mahadevan S (2009) Manifold alignment without correspondence. In: IJCAI’09: Proceedings of the 21st international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1273–1278

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhishek Tripathi.

Additional information

Responsible editor: Johannes Gehrke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, A., Klami, A., Orešič, M. et al. Matching samples of multiple views. Data Min Knowl Disc 23, 300–321 (2011). https://doi.org/10.1007/s10618-010-0205-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0205-7

Keywords

Navigation