Skip to main content
Log in

Using KCCA for Japanese–English cross-language information retrieval and document classification

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Kernel Canonical Correlation Analysis (KCCA) is a method of correlating linear relationship between two variables in a kernel defined feature space. A machine learning algorithm based on KCCA is studied for cross-language information retrieval. We apply the algorithm in Japanese–English cross-language information retrieval. The results are quite encouraging and are significantly better than those obtained by other state of the art methods. Computational complexity is an important issue when applying KCCA to large dataset as in information retrieval. We experimentally evaluate several methods to alleviate the problem of applying KCCA to large datasets. We also investigate cross-language document classification using KCCA as well as other methods. Our results show that it is feasible to use a classifier learned in one language to classify the documents in other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bach, F. R., & Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3, 1–48.

    Article  MathSciNet  Google Scholar 

  • Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to Support Vector Machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Cristianini, N., Shawe-Taylor, J., & Lodhi, H. (2002). Latent semantic kernels. Journal of Intelligent Information System, 18(2/3), 127–152.

    Article  Google Scholar 

  • Hardon, D. R., Szedmark, S., & Shawe-Taylor, J. (2003). Canonical correlation analysis: An overview with application to learning methods. Technical Report CSD-TR-03-02, Department of Computer Science, Royal Holloway, University of London.

  • Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 312–377.

    Article  Google Scholar 

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In C. Nédellec & C. Rouveirol (Eds.), Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398 Lecture Notes in Computer Science, Chemnitz, DE (pp. 137–142). Heidelberg, DE: Springer Verlag.

    Google Scholar 

  • Lewis, D. D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5(Apr), 361–397.

    Google Scholar 

  • Li, Y., & Shawe-Taylor, J. (2003). The SVM with uneven margins and Chinese document categorization. In Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17), Singapore, Oct (pp. 216–227).

  • Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. (2002). The perceptron algorithm with uneven margins. In Proceedings of the 9th International Conference on Machine Learning (ICML-2002) (pp. 379–386).

  • Littman, M. L., Dumais, S. T., & Landauer, T. K. (1998). Automatic cross-language information retrieval using latent semantic indexing. In G. Grefenstette (Ed.), Cross language information retrieval. Dordrecht: Kluwer.

    Google Scholar 

  • Makita, M., Higuchi, S., Fujii, A., & Ishikawa, T. (2003). A system for Japanese–English–Korean multilingual patent retrieval. In Proceedings of Machine Translation Summit IX. Retrieved Sept., 2003, from http://www.amtaweb.org/summit/MTSummit/papers.html.

  • Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2002). Inferring a semantic representation of text via cross-language correlation analysis. In Advances of neural information processing systems, 15.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaoyong Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Shawe-Taylor, J. Using KCCA for Japanese–English cross-language information retrieval and document classification. J Intell Inf Syst 27, 117–133 (2006). https://doi.org/10.1007/s10844-006-1627-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-1627-y

Keywords

Navigation