Unsupervised group matching with application to cross-lingual topic matching without alignment information

Iwata, Tomoharu; Kanagawa, Motonobu; Hirao, Tsutomu; Fukumizu, Kenji

doi:10.1007/s10618-016-0470-1

Unsupervised group matching with application to cross-lingual topic matching without alignment information

Published: 24 June 2016

Volume 31, pages 350–370, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Tomoharu Iwata¹,
Motonobu Kanagawa²,
Tsutomu Hirao¹ &
…
Kenji Fukumizu²

675 Accesses
8 Citations
Explore all metrics

Abstract

We propose a method for unsupervised group matching, which is the task of finding correspondence between groups across different domains without cross-domain similarity measurements or paired data. For example, the proposed method can find matching of topic categories in different languages without alignment information. The proposed method interprets a group as a probability distribution, which enables us to handle uncertainty in a limited amount of data, and to incorporate the high order information on groups. Groups are matched by maximizing the dependence between distributions, in which we use the Hilbert Schmidt independence criterion for measuring the dependence. By using kernel embedding which maps distributions into a reproducing kernel Hilbert space, we can calculate the dependence between distributions without density estimation. In the experiments, we demonstrate the effectiveness of the proposed method using synthetic and real data sets including an application to cross-lingual topic matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

Article 13 November 2015

Geert Heyman, Ivan Vulić & Marie-Francine Moens

Group topic model: organizing topics into groups

Article 10 September 2014

Ximing Li, Jihong Ouyang, … Tian Tian

A word embedding-based approach to cross-lingual topic modeling

Article 24 April 2021

Chia-Hsuan Chang & San-Yih Hwang

Notes

More precisely, kernel k is called characteristic if the map \(\mathcal {P}\rightarrow {\mathcal {H}}_k: \mathbb {P}\rightarrow \mu _\mathbb {P}:= \int k(\cdot ,x) d\mathbb {P}(x)\) is injective. Thus, if we use a characteristic kernel, then the embedding \(\mu _\mathbb {P}\) uniquely identifies the underling distribution \(\mathbb {P}\).
A kernel is called universal if its associated RKHS is dense in the space of bounded continuous functions (Steinwart 2001).

References

Barnard K, Duygulu P, Forsyth D, De Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learning Res 3:1107–1135
MATH Google Scholar
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Article Google Scholar
Christmann A, Steinwart I (2010) Universal kernels on non-standard input spaces. In: Advances in neural information processing systems, pp 406–414
Coleman TF, Li Y (1996) An interior trust region approach for nonlinear minimization subject to bounds. SIAM J Optim 6(2):418–445
Article MathSciNet Google Scholar
Djuric N, Grbovic M, Vucetic S (2012) Convex kernelized sorting. In: AAAI conference on artificial intelligence
Doan A, Madhavan J, Domingos P, Halevy A (2004) Ontology matching: a machine learning approach. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, pp 385–403
Chapter Google Scholar
Dudley RM (2002) Real analysis and probability. Cambridge University Press, Cambridge
Book Google Scholar
Fukumizu K, Bach FR, Jordan MI (2004) Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J Mach Learning Res 5:73–99
MathSciNet MATH Google Scholar
Fukumizu K, Gretton A, Sun X, Schölkopf B (2008) Kernel measures of conditional dependence. In: Advances in neural information processing systems, pp 489–496
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory 3734:63–77
MathSciNet MATH Google Scholar
Gretton A, Borgwardt K, Rasch M, Schölkopf B, Smola A (2012a) A kernel two-sample test. J Mach Learning Res 13:723–773
MathSciNet MATH Google Scholar
Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A (2012b) A kernel two-sample test. J Mach Learning Res 13(1):723–773
MathSciNet MATH Google Scholar
Haghighi A, Liang P, Berg-Kirkpatrick T, Klein D (2008) Learning bilingual lexicons from monolingual corpora. In: Annual meeting of the association for computational linguistics: human language technologies, pp 771–779
Iwata T, Hirao T, Ueda N (2013) Unsupervised cluster matching via probabilistic latent variable models. In: AAAI conference on artificial intelligence, pp 445–451
Jagarlamudi J, Juarez S, Daumé III H (2010) Kernelized sorting for natural language processing. In: AAAI conference on artificial intelligence, pp 1020–1025
Kamahara J, Asakawa T, Shimojo S, Miyahara H (2005) A community-based recommendation system to reveal unexpected interests. In: International multimedia modelling conference, pp 433–438
Klami A (2012) Variational Bayesian matching. In: Asian conference on machine learning, pp 205–220
Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res Logist Q 2(1–2):83–97
Article MathSciNet Google Scholar
Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: International conference on machine learning, pp 617–624
Muandet K, Schölkopf B (2013) One-class support measure machines for group anomaly detection. In: Conference on uncertainty in artificial intelligence, pp 449–458
Muandet K, Fukumizu K, Dinuzzo F, Schölkopf B (2012) Learning from distributions via support measure machines. In: Advances in neural information processing systems, pp 10–18
Parthasarathy KR (1967) Probability measures on metric spaces. Academic Press, New York
Book Google Scholar
Quadrianto N, Smola AJ, Song L, Tuytelaars T (2010) Kernelized sorting. IEEE Trans Pattern Anal Mach Intell 32(10):1809–1821
Article Google Scholar
Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 25(1):158–176
Article Google Scholar
Smola A, Gretton A, Song L, Schölkopf B (2007) A Hilbert space embedding for distributions. In: Algorithmic learning theory, pp 13–31
Google Scholar
Socher R, Fei-Fei L (2010) Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In: IEEE conference on computer vision and pattern recognition, pp 966–973
Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learning Res 13(1):1393–1434
MathSciNet MATH Google Scholar
Sriperumbudur BK, Fukumizu K, Gretton A, Lanckriet GR, Schölkopf B (2009) Kernel choice and classifiability for RKHS embeddings of probability distributions. In: Advances in neural information processing systems, pp 1750–1758
Sriperumbudur BK, Gretton A, Fukumizu K, Schölkopf B, Lanckriet GR (2010) Hilbert space embeddings and metrics on probability measures. J Mach Learning Res 11:1517–1561
MathSciNet MATH Google Scholar
Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learning Res 2:67–93
MathSciNet MATH Google Scholar
Taira H, Haruno M (1999) Feature selection in SVM text categorization. In: National conference on artificial intelligence, pp 480–486
Terada A, Sese J (2012) Global alignment of protein-protein interaction networks for analyzing evolutionary changes of network frameworks. In: Proceedings of 4th international conference on bioinformatics and computational biology, pp 196–201
Tripathi A, Klami A, Virpioja S (2010) Bilingual sentence matching using kernel CCA. In: IEEE international workshop on machine learning for signal processing, pp 130–135
Yamada M, Sugiyama M (2011) Cross-domain object matching with model selection. In: International conference on artificial intelligence and statistics, pp 807–815

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, 2-4 Hikaridai, Seikacho, Sorakugun, Kyoto, 619-0237, Japan
Tomoharu Iwata & Tsutomu Hirao
The Institute of Statistical Mathematics, Tokyo, Japan
Motonobu Kanagawa & Kenji Fukumizu

Authors

Tomoharu Iwata
View author publications
You can also search for this author in PubMed Google Scholar
Motonobu Kanagawa
View author publications
You can also search for this author in PubMed Google Scholar
Tsutomu Hirao
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Fukumizu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomoharu Iwata.

Additional information

Responsible editor: Jieping Ye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iwata, T., Kanagawa, M., Hirao, T. et al. Unsupervised group matching with application to cross-lingual topic matching without alignment information. Data Min Knowl Disc 31, 350–370 (2017). https://doi.org/10.1007/s10618-016-0470-1

Download citation

Received: 08 April 2015
Accepted: 09 June 2016
Published: 24 June 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10618-016-0470-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised group matching with application to cross-lingual topic matching without alignment information

Abstract

Access this article

Similar content being viewed by others

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

Group topic model: organizing topics into groups

A word embedding-based approach to cross-lingual topic modeling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised group matching with application to cross-lingual topic matching without alignment information

Abstract

Access this article

Similar content being viewed by others

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

Group topic model: organizing topics into groups

A word embedding-based approach to cross-lingual topic modeling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation