Abstract
Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bach FR, Jordan MI (2005) A probabilistic interpretation of canonical correlation analysis. Technical Report 688, Department of Statistics, University of California, Berkeley
Barzilay R, Elhadad N (2003) Sentence alignment for monolingual comparable corpora. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, Morristown, NJ, USA, pp 25–32
Bickel S, Scheffer T (2005) Estimation of mixture models using Co-EM. In: Proceedings of the European conference on machine learning, Lecture Notes in Computer Science, vol 3720/2005. Springer, Berlin, Heidelberg, pp 35–46. doi:10.1007/11564096
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, NY, USA, pp 127–134
Burkard R, Dell’Amico M, Martello S (2009) Assignment problems. Society for Industrial and Applied Mathematics, Philadelphia
Damian D, Orešič M, Verheij E, Meulman J, Friedman J, Adourian A, Morel N, Smilde A, Greef J (2007) Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics 3: 69–77
Duff IS, Koster J (2001) On algorithms for permuting large entries to the diagonal of a sparse matrix. SIAM J Matrix Anal Appl 22(4): 973–996. doi:10.1137/S0895479899358443
Farquhar JDR, Hardoon DR, Meng H, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. In: Weiss Y, Schölkopf B, Platt J (eds) Advances in neural information processing systems, vol 18. MIT Press, Cambridge, MA, pp 355–362
Gretton A, Herbrich R, Smola A (2003) The kernel mutual information. In: Proceedings of ICASSP’03, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp IV-880–IV-883
Haghighi A, Liang P, Berh-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Proceedings of ACL-08: HLT. Association for Computational Linguistics, Columbus, Ohio, pp 771–779
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12): 2639–2664
Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4): 325–340. doi:10.1007/BF02278710
Klami A, Kaski S (2005) Non-parametric dependent components. In: Proceedings of ICASSP’05, IEEE international conference on acoustics, speech, and signal processing, IEEE, pp V-209–V-212
Klami A, Kaski S (2008) Probabilistic approach to detecting dependencies between data sets. Neurocomputing 72(1–3): 39–46
Kuhn HW (1955) The Hungarian method for the assignment problem. Naval Res Logist Quart 2(1–2): 83–97
Li Y, Shawe-Taylor J (2006) Using KCCA for Japanese-English cross-language information retrieval and document classification. J Intel Inf Syst 27(2): 117–133. doi:10.1007/s10844-006-1627-y
Melamed D (1999) Bitext maps and alignment via pattern recognition. Comput Linguist 25(1): 107–130
Nikkilä J, Sysi-Aho M, Ermolov A, Seppänen-Laakso T, Simell O, Kaski S, Orešič M (2008) Gender dependent progression of systemic metabolic states in early childhood. Mole Syst Biol 4: 197. doi:10.1038/msb.2008.34
Orešič M, Hänninen V, Vidal-Puig A (2008) Lipidomics: a new window to biomedical frontiers. Trends Biotechnol 26(12): 647–652. doi:10.1016/j.tibtech.2008.09.001
Orešič M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppänen-Laakso T, Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, Reinikainen A, Lähde J, Suortti T, Hakalax J, Simell T, Hyöty H, Veijola R, Ilonen J, Lahesmaa R, Knip M, Simell O (2008) Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes. J Exp Med 205(13): 2975–2984
Quadrianto N, Song L, Smola A (2009) Kernelized sorting. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 1289–1296
Rogers S, Girolami M, Kolch W, Waters KM, Liu T, Thrall B, Wiley HS (2008) Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics 24(24): 2894–2900. doi:10.1093/bioinformatics/btn553
Rogers S, Klami A, Sinkkonen J, Girolami M, Kaski S (2010) Infinite factorization of multiple non-parametric views. Mach Learn 79(1-2): 201–226. doi:10.1007/s10994-009-5155-1
Smola AJ, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Takimoto E (ed) Algorithmic learning theory, Lecture Notes on Computer Science, invited paper. Springer, Berlin, Heidelberg, pp 13–31
Tripathi A, Klami A, Kaski S (2008) Using dependencies to pair samples for multi-view learning. TKK reports in information and computer science TKK-ICS-R8, Helsinki University of Technology, Espoo, Finland
Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249
Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: Kaski S, Miller DJ, Oja E, Honkela A (eds) Proceedings of IEEE international workshop on machine learning for signal processing (MLSP), IEEE, pp 130–135. doi:10.1109/MLSP.2010.5589249
Vinokourov A, Hardoon DR, Shawe-taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: In proceedings of fourth international symposium on independent component analysis and blind source separation
Wang C, Mahadevan S (2008) Manifold alignment using Procrustes analysis. In: Proceedings of the 25th international conference on machine learning, pp 1120–1127
Wang C, Mahadevan S (2009) Manifold alignment without correspondence. In: IJCAI’09: Proceedings of the 21st international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1273–1278
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Johannes Gehrke.
Rights and permissions
About this article
Cite this article
Tripathi, A., Klami, A., Orešič, M. et al. Matching samples of multiple views. Data Min Knowl Disc 23, 300–321 (2011). https://doi.org/10.1007/s10618-010-0205-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0205-7