skip to main content
10.1145/1651437.1651447acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Cross-language linking of news stories on the web using interlingual topic modelling

Published: 02 November 2009 Publication History

Abstract

We have studied the problem of linking event information across different languages without the use of translation systems or dictionaries. The linking is based on interlingua information obtained through probabilistic topic models trained on comparable corpora written in two languages (in our case English and Dutch). The achieve this, we expand the Latent Dirichlet Allocation model to process documents in two languages. We demonstrate the validity of the learned interlingual topics in a document clustering task, where the evaluation is performed on Google News.

References

[1]
J. Allan, V. Lavrenko, and R. Swan. Explorations within Topic Tracking and Detection, ir 20, pages 197--224. Kluwer Academic Publishers, 2002.
[2]
J. Allan, C. Wade, and A. Bolivar. Retrieval and novelty detection at the sentence level. In SIGIR'03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 314--321, New York, NY, USA, 2003. ACM.
[3]
A. Bagga and B. Baldwin. Algorithms for scoring coreference chains. In In The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pages 563--566, 1998.
[4]
M. W. Berry, S. T. Dumais, and G. W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Rev., 37(4):573--595, 1995.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[6]
J. G. Carbonell, J. G. Yang, R. E. Frederking, R. D. Brown, Y. Geng, D. Lee, Y. Frederking, R. E, R. D. Geng, and Y. Yang. Translingual information retrieval: A comparative evaluation. In In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 708--714, 1997.
[7]
P. A. Chew, B. W. Bader, T. G. Kolda, and A. Abdelali. Cross-language information retrieval using parafac2. In KDD'07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 143--152, New York, NY, USA, 2007. ACM.
[8]
D. R. Cutting, J. O. Pedersen, D. Karger, and J. W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318--329, 1992.
[9]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391--407, 1990.
[10]
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, Stockholm, 1999.
[11]
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical methods. In Machine Learning, pages 183--233. MIT Press, 1998.
[12]
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In SIGIR'04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 297--304, New York, NY, USA, 2004. ACM.
[13]
L. S. Larkey, F. Feng, M. Connell, and V. Lavrenko. Language-specific models in multilingual topic tracking. In SIGIR'04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 402--409, New York, NY, USA, 2004. ACM.
[14]
Z. Li, B. Wang, M. Li, and W.-Y. Ma. A probabilistic model for retrospective news event detection. In SIGIR'05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 106--113, New York, NY, USA, 2005. ACM.
[15]
M. Littman, S. T. Dumais, and T. K. Landauer. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval, chapter 5, pages 51--62. Kluwer Academic Publishers, 1998.
[16]
U. Makkonen, H. Ahonen-Myka, and Marko. Applying semantic classes in event detection and tracking. In Proc. International Conference on Natural Language Processing (ICON'02), pages 175--183, 2002.
[17]
B. Mathieu, R. Besançon, and C. Fluhr. Multilingual document clusters discovery. In RIAO, pages 116--125, 2004.
[18]
T. Muramatsu and T. Mori. Integration of plsa into probabilistic clir model. In Proceedings of NTCIR-04, 2004.
[19]
X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Mining multilingual topics from wikipedia. In 18th International World Wide Web Conference, pages 1155--1155, April 2009.
[20]
B. Pouliquen, R. Steinberger, C. Ignat, and T. D. Groeve. Geographical information recognition and visualization in texts written in various languages. In SAC, pages 1051--1058, 2004.
[21]
W. D. Smet and M.-F. Moens. An aspect based document representation for event clustering. In Proceedings of CLIN 19.
[22]
E. M. Voorhees. Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Technical report, Ithaca, NY, USA, 1986.
[23]
Y. Wu and D. W. Oard. Bilingual topic aspect classification with a few training examples. In SIGIR'08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 203--210, New York, NY, USA, 2008. ACM.
[24]
Y. Yang, J. G. Carbonell, R. D. Brown, T. Pierce, B. T. Archibald, and X. Liu. Learning approaches for detecting and tracking news events. IEEE Intelligent Systems, 14(4):32--43, 1999.
[25]
K. Zhang, J. Zi, and L. G. Wu. New event detection based on indexing-tree and named entity. In SIGIR'07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 215--222, New York, NY, USA, 2007. ACM.
[26]
B. Zhao and E. P. Xing. Bitam: Bilingual topic admixture models for word alignment. In ACL, 2006.

Cited By

View all
  • (2022)A clustering-based topic model using word networks and word embeddingsJournal of Big Data10.1186/s40537-022-00585-49:1Online publication date: 11-Apr-2022
  • (2022)The early days of contemporary philosophy of science: novel insights from machine translation and topic-modeling of non-parallel multilingual corporaSynthese10.1007/s11229-022-03722-x200:3Online publication date: 31-May-2022
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SWSM '09: Proceedings of the 2nd ACM workshop on Social web search and mining
November 2009
78 pages
ISBN:9781605588063
DOI:10.1145/1651437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event detection
  2. latent dirichlet allocation

Qualifiers

  • Research-article

Conference

CIKM '09
Sponsor:

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A clustering-based topic model using word networks and word embeddingsJournal of Big Data10.1186/s40537-022-00585-49:1Online publication date: 11-Apr-2022
  • (2022)The early days of contemporary philosophy of science: novel insights from machine translation and topic-modeling of non-parallel multilingual corporaSynthese10.1007/s11229-022-03722-x200:3Online publication date: 31-May-2022
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • (2021)Crosslingual Topic Modeling with WikiPDAProceedings of the Web Conference 202110.1145/3442381.3449805(3032-3041)Online publication date: 19-Apr-2021
  • (2021)Building the Bridge: Topic Modeling for Comparative ResearchCommunication Methods and Measures10.1080/19312458.2021.196597316:2(96-114)Online publication date: 7-Sep-2021
  • (2020)Trend analysis and fatality causes in Kenyan roads: A review of road traffic accident data between 2015 and 2020Cogent Engineering10.1080/23311916.2020.17979817:1(1797981)Online publication date: 5-Aug-2020
  • (2020)Using Topic Modelling to Correlate a Research Institution’s Outputs with Its GoalsAdvances in Information and Communication10.1007/978-3-030-39442-4_13(147-156)Online publication date: 13-Feb-2020
  • (2019)Improving the Translation Environment for Professional TranslatorsInformatics10.3390/informatics60200246:2(24)Online publication date: 20-Jun-2019
  • (2019)Unsupervised Multilingual Ontology Learning2019 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)10.1109/iSAI-NLP48611.2019.9045197(1-7)Online publication date: Oct-2019
  • (2018)Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence FeaturesChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-030-01716-3_7(76-87)Online publication date: 7-Oct-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media