Skip to main content

Multilingual Story Link Detection Based on Event Term Weighting on Times and Multilingual Spaces

  • Conference paper
Book cover Digital Libraries: International Collaboration and Cross-Fertilization (ICADL 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3334))

Included in the following conference series:

  • 927 Accesses

Abstract

In this paper, we propose a novel approach for multilingual story link detection. Our approach uses features such as timelines and multilingual spaces for giving distinctive weights to terms that constitute linguistic representation of events. On timelines term significance is calculated by comparing term distribution of the documents on a day with that of the total document collection. Since two languages can provide more information than one language, term significance is measured on each language space, which is then used as a bridge between two languages on multilingual (here bilingual) spaces. Evaluating the method in Korean and Japanese news articles, our method achieved 14.3% improvement for monolingual story pairs, and 16.7% improvement for multilingual story pairs. By measuring the space density, the proposed weighting components are verified with a high density of the intra-event stories and a low density of the inter-events stories. This result indicates that the proposed method is helpful for multilingual story link detection.

This study is partly supported by “Information Utilization for Heterogeneous Contents” (13224087), Japanese Grant-in-Aid Scientific Research on Priority Area “Informatics” (Area #006).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atlam, E., Okada, M., Shishibori, M., Aoe, J.: An evaluation method of words tendency depending on time-series variation and its improvements. Information Processing and Management 38(2) (2002)

    Google Scholar 

  2. Carbonell, J., Yang, Y., Brown, R., Zhang, J., Ma, N.: New event & link detection at CMU for TDT 2002. In: Proc. of Topic Detection and Tracking (TDT-2002) Evaluations (2002)

    Google Scholar 

  3. Chen, Y., Chen, H.: NLP and IR approaches to monolingual and multilingual link detection. In: Proc. of 19th International Conference on Computational Linguistics (2002)

    Google Scholar 

  4. Eichmann, D.: Tracking & detection using entities and noun phrases. In: Proc. of Topic Detection and Tracking (TDT-2002) Workshop (2002)

    Google Scholar 

  5. Fiscus, J., Doddington, G., Garofolo, J., Martin, A.: NIST’s 1998 topic detection and tracking evaluation (TDT2). In: Proc. of DARPA Broadcast News Workshop (1999)

    Google Scholar 

  6. Fukumoto, F., Suzuki, Y.: Event tracking based on domain dependency. In: Proc. of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2000)

    Google Scholar 

  7. He, D., Park, H.-R., Murray, G., Subotin, M., Oard, D.: TDT-2002 topic tracking at Maryland: first experiments. In: Proc. of Topic Detection and Tracking Workshop (2002)

    Google Scholar 

  8. Kwon, O.-W., Kang, I.-S., Lee, J.-H., Lee, G.: Conceptual cross-language text retrieval based on document translation using Japanese-to-Korean MT System. Computer Processing of Oriental Languages 12(1) (1998)

    Google Scholar 

  9. Lam, W., Huang, R.: Link detection for multilingual new for the TDT2002 evaluation. In: Proc. of Topic Detection and Tracking (TDT-2002) Workshop (2002)

    Google Scholar 

  10. Leek, T., Jin, H., Sista, S., Schwartz, R.: The BBN crosslingual topic detection and tracking system. In: Proc. of Topic Detection and Tracking (TDT-1999) Workshop (1999)

    Google Scholar 

  11. Levow, G.-A., Oard, D.W.: Translingual topic detection: applying lessons from the MEI project. In: Proc. of Topic Detection and Tracking (TDT-2000) Workshop (2000)

    Google Scholar 

  12. Masui, F., Suzuki, N., Hukumoto, J.: Named entity extraction (NExT) for text processing development. In: Proc. of 8th time annual meeting of The Association for Natural Language Processing, Japan (2002), http://www.ai.info.mieu.ac.jp/~next/next.html

  13. Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., Asahara, M.: Morphological analysis system ChaSen version 2.2.9. Nara Institute of Science and Technology (2002)

    Google Scholar 

  14. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11) (1975)

    Google Scholar 

  15. Swan, R., Allan, J.: Automatic generation of overview timelines. In: Proc. of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000 (2000)

    Google Scholar 

  16. ChangshinSoft, ezTrans Korean-to-Japanese/Japanese-to-Korean machine translation system (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, KS., Kageura, K. (2004). Multilingual Story Link Detection Based on Event Term Weighting on Times and Multilingual Spaces. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, Ep. (eds) Digital Libraries: International Collaboration and Cross-Fertilization. ICADL 2004. Lecture Notes in Computer Science, vol 3334. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30544-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30544-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24030-3

  • Online ISBN: 978-3-540-30544-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics