Abstract
The global information society has radically changed the way in which know-ledge is acquired, disseminated and exchanged. Users of internationally distributed networks need to be able to find, retrieve and understand relevant information in whatever language and form it may have been stored. For this reason, much attention has been given over the past few years to the study and development of tools and technologies for multilingual information access (MLIA). This is a complex, multidisciplinary area in which methodologies and tools developed in the fields of information retrieval and natural language processing converge. Two main sectors are involved: multiple language recognition, manipulation and display; cross-language search and retrieval. The paper provides an overview of the main issues of interest in both these areas. Topics covered include: multilingual document indexing, specific requirements of particular languages and scripts, techniques for cross-language information retrieval (CLIR), resources, and system and component evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adriani, M., van Rijsbergen, C.J.: Term Similarity-Based Query Expansion for Cross-Language Information Retrieval. In Lecture Notes in Computer Science, Volume 1696, 1999.
Ballerini, J.P., Buchel, M., Domenig, R., Knaus, D., Mateev, B., Mittendorf, E., Schäuble, P., Sheridan, P., Wechsler, M.: SPIDER Retrieval System at TREC-5. In Proceedings of the Fifth Text Retrieval Conference TREC-5, National Institute of Standards and Technology (NIST), Gaithersburg, MD, 1996.
Ballesteros, L.: Cross-Language Retrieval via Transitive Translation. In Croft, W.B. (ed.): Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic Publishers, Boston, 2000.
Ballestreros, L., Croft, W.B.: Resolving Ambiguity for Cross-language Retrieval. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, 84–91, 1997.
Ballesteros, L., Croft, W.B.: Dictionary-based methods for cross-lingual information retrieval. In Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, 791–801, 1996.
Ballesteros, L., Croft, W.B.: Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. In Working Notes of AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, CA, 1–8, 1997.
Blasband, M., Paroubek, P. (eds.): A Blueprint for a General Infrastructure for Natural Language Processing Systems Evaluation. Deliverable 1.1 of the ELSE project: http://www.limsi.fr/TLP/ELSE/ELSED11EN.HTM
Braschler, M., Kluck, M., Harman, D., Peters, C., Schäuble, P.: The Evaluation of Systems for Cross-Language Information Retrieval. In Gavrilidou, M., Carayannis, G., Markantonatou, S., Piperidis, S., Stainhaouer, G. (eds.) Proceedings of First International Conference on Language Resources and Evaluation, Athens, Greece, 31 May-2 June 2000, 1469–1474. See also: http://www.iei.pi.cnr.it/DELOS/CLEF/
Braschler, M., Krause, J., Peters, P., Schäuble, P.: Cross-Language Information Retrieval (CLIR) Track Overview, In Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST, Gaithersburg, MD, 1999.
Brown, M., Foote, J.T., Jones, G.J.F., Sparck-Jones, K., Young, S.J.: Video Mail Retrieval by Voice: An Overview of the Cambridge/Olivetti Retrieval System. In Multimedia Data Base Management Systems Workshop, 2nd ACM International Conference on Multimedia, 1994.
Brown, M., Foote, J., Jones, G., Jones, K.S., Young, S.: Open-vocabulary Speech Indexing for Voice and Video Mail Retrieval. In Proceedings of the ACM Mul timedia Conference, Boston, MA, 1996.
Cavnar, W., Trenkle, J.: N-gram Based Text Categorization, In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 161–169, 1994.
Chaudiron, S., Schmitt, L.: AMARYLLIS: An Evaluation-based Program for Text Retrieval in French. In Jacquemin, C., Mariani, J. Paroubek, P. (eds.) Using Evaluation within HLT Programmes: Results and Trends. Workshop Proceedings. LREC 2000, 30 May 2000, Athens, Greece: http://www.inist.fr/accueil/profran.htm
Damashek, M.: Guaging Similarity with N-grams: Language-independent Categorization of Text. Science, 267(10), 1995.
Dunning, T.: Statistical Identification of Language. CRL Technical Memo MCCS-94–273, Computing Research Laboratory, New Mexico State University, 1994.
EMIR Consortium: Final report of the EMIR Project Number 5312. Commission of the European Union, Brussels, 1994.
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms, Prentice-Hall, 1992.
Gachot, D.A., Lange, E., Yang, J.: The SYSTRAN NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval. In: [21, p. 105–118], 1998.
Glavitsch, U., Schäuble P.: A System for Retrieving Speech Documents. In Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, Denmark, 168–176, 1992.
Glavitsch, U., Schäuble, P., Wechsler, M.: Metadata for Integrating Speech Documents in a Text Retrieval System. SIGMOD Record, 23(4):57–63, 1994.
Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, 1998.
Harman, D.: A Failure Analysis on the Limitations of Suffxing in an Online Environment. In Proceedings of the 10th International ACM SIGIR Conference on Research and Development in Information Retrieval, 102–108, 1987.
Harman, D.: How Effective is Suffxing? Journal of the American Society for Information Science, 42(1):321–331, 1991.
Hovy, E., Ide, N., Frederking, R. (eds.): Multilingual Information Management: Current Levels and Future Abilities, NSF/EC/DARPA, April 1999. See: http://www.cs.cmu.edu/~ref/mlim/index.html
Hull, D., Grefenstette, G.: Stemming Algorithms-A Case Study for Detailed Evaluation. Journal of the American Society for Information Science, 47(1):70–84, 1996.
Hull, D.A., Grefenstette, G.: Querying Across Languages. A Dictionary-based Approach to Multilingual Information Retrieval. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 49–57, 1996.
ISO/IEC International Standard 10646-1:1993(E): Information technology Universal Multiple-Octet Coded Character Set (UCS)-Part 1: Architecture and Basic Multilingual Plane. International Organization for Standardization, Geneva 1993.
ISO Standard 5964-1985: Guidelines for the establishment and development of multilingual thesauri. First edition 1985–02–15. International Organisation for Standardisation, Technical Committee ISO/TC 46.
James, D.: A System for Unrestricted Topic Retrieval from Radio Broadcasts. In Proceedings of ICASSP, Atlanta, GA, 279–282, 1996.
Jones, G., Foote, J., Jones, K.S., Young, S.: Retrieving Spoken Documents by Combining Multiple Index Sources. In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 30–38, 1996.
Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., Hidaka, S., Adachi, J.: The NTCIR Workshop: the First Evaluation Workshop on Japanese Text Retrieval and Cross-Lingual Information Retrieval. International Workshop on Information Retrieval with Asian Languages, Nov. 11-12 1999, Taipei, Taiwan 1999.
Kikui, G.: Identifying the Coding System and Language of On-line Documents on the Internet. In Proceedings of the Sixteenth International Conference on Computational Linguistics: COLING’96, Copenhagen, Denmark, 1996.
Krovetz, R.: Viewing Morphology as an Inference Process. In Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, PA, 191–202, 1993.
Lennon, M., Pierce, D., Tarry, B., Willet, P.: An Evaluation of some Conflation Algorithms for Information Retrieval. Journal of Information Science, 3:177–183, 1981.
Littman, M.L., Dumais, S.T., Landauer, T.K.: Automatic Cross-Language Information Retrieval using Latent Semantic Indexing. In Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, pp 51–62, 1998.
Lovins, J.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics, 11:22–31, 1968.
Miller, G.: WordNet: An On-line Lexical Database, International Journal of Lexicography, Special Issue, 3(4), 1990.
Mittendorf, E., Schäuble, P., Sheridan, P.: Applying Probabilistic Term Weighting to OCR Text in the case of a Large Alphabetic Library Catalogue. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, 328–335, 1995.
Oard, D.W.: Web Language Distribution. Web site for Research Resources on Cross-Language Text Retrieval. See: http://www.clis.umd.edu/dlrg/filter/papers/
Pevzner, B.: Comparative Evaluation of the Operation of the Russian and English variants of the Pusto-Nepusto-2 System. Automatic Documentation and Mathematical Linguistics, 6:71–74, 1972.
Picchi, E., Peters, C.: Cross-Language Information Retrieval: A System for Comparable Corpus Querying. In Grefenstette, G. (ed.): Cross-Language Information Retrieval, The Kluwer International Series on Information Retrieval, Kluwer Academic Publishers, Boston, 81–92, 1998.
Porter, M.F.: An Algorithm for Suffix Stripping. Program, 14(3):130–137, 1980.
Salton, G.: Automatic Processing of Foreign Language Documents. Prentice-Hill, Englewood Cliffs, NJ 1971.
Schäuble, P., Sheridan, P.: Cross-Language Information Retrieval (CLIR) Track Overview. In Proceedings of the Sixth Text Retrieval Conference (TREC-6). NIST, Gaithersburg, MD, 1998.
Schäuble, P., Smeaton, A.: An International Research Agenda for Digital Libraries: Summary Report of the Series of Joint NSF-EU Working Groups on Future Directions for Digital Libraries Research, 1998. See: http://www.iei.pi.cnr.it/DELOS/NSF/nsf.htm
Schäuble, P.: Multimedia Information Retrieval: Content-Based Information Retrieval from Large Text and Audio Databases. Kluwer Academic Publishers, 1997.
Sheridan, P., Wechsler, M., Schäuble, P.: Cross-Language Speech Retrieval: Establishing a Baseline Performance. In Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, 1997.
Sheridan, P., Ballerini, J.P.: Experiments in Multilingual Information Retrieval using the SPIDER System, In Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp 58–65, 1996.
Sheridan, P., Braschler, M., Schäuble, P.: Cross-Language Information Retrieval in a Multilingual Legal Domain. In Proceedings of the 1st European Conference on Digital Libraries, ECDL’97, Pisa, Italy, pp 253–268, 1997.
Sibun, P., Reynar, J.: Language Identification: Examining the Issues. In Proceedings of the Symposium on Document Anal ysis and Information Retrieval, Las Vegas, 125–135, 1996.
Soergel, D.: Multilingual Thesauri in Cross-Language Text and Speech Retrieval. In Working Notes of AAAI Spring Symposium on Cross-Language Text and Speech Retrieval, Stanford, CA, 164–170, 1997.
Souter, C., Churcher, G., Hayes, J., Johnson, S.: Natural Language Identification using Corpus-based Models. Hermes Journal of Linguistics, 13:183–203, Faculty of Modern Languages, Aarhus School of Business, Denmark, 1994.
Unicode Consortium: The Unicode Standard Worldwide Character Encoding. Version 1.0. Vols. 1 and 2, Addison-Wesley 1991.
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London, second edition, 1979.
Wechsler, M., Schäuble, P.: Speech Retrieval Based on Automatic Indexing. In Ruthven I. (ed.), Proceedings of the Final Workshop on Mul timedia Information Retrieval (MIRO’95), Electronic Workshop in Computing, Glasgow, Springer, 1995.
Wechsler, M., Sheridan, P., Schäuble, P.: Multi-Language Text Indexing for Internet Retrieval. In Proceedings of the 5th RIAO Conference, Computer-Assisted Information Searching on the Internet, Montreal, Canada, June 1997.
Wactlar, H., Kanade, T., Smith, M., Stevens, S.: Intelligent Access to Digital Video: The Informedia Project. IEEE Computer, 29(5), 1996.
White, John (ed.): Evaluation and Assessment Techniques. In Hovy, E., Ide, N., Frederking, R. (eds.): Multilingual Information Management: Current Levels and Future Abilities: http://www.cs.cmu.edu/~ref/mlim/chapter8.html
Ziegler, D.: The Automatic Identification of Languages Using Linguistic Recognition Signals. PhD Thesis, State University of New York, Buffalo, 1991.
Oard, D.W.: Web site for Cross-Language Information Retrieval Resources, http://www.ee.umd.edu/medlab/mlir/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Peters, C., Sheridan, P. (2000). Multilingual Information Access. In: Agosti, M., Crestani, F., Pasi, G. (eds) Lectures on Information Retrieval. ESSIR 2000. Lecture Notes in Computer Science, vol 1980. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45368-7_3
Download citation
DOI: https://doi.org/10.1007/3-540-45368-7_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41933-4
Online ISBN: 978-3-540-45368-0
eBook Packages: Springer Book Archive