Skip to main content

Parallel Text Alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1513))

Abstract

Parallel Text Alignment (PTA) is the problem of automatically aligning content in multiple text documents originating or derived from the same source. The implications of this result in improving multimedia data access in digital library applications range from facilitating the analysis of multiple English language translations of classical texts to enabling the ondemand and random comparison of multiple transcriptions derived from a given audio stream, or associated with a given stream of video, audio, or images. In this paper we give an efficient algorithm for achieving such an alignment, and demonstrate its use with two applications. This result is an application of the new framework of Cross-Modal Information Retrieval recently developed at Dartmouth.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Owen, C.B. and Makedon, F.: Cross-Modal Retrieval of Scripted Speech Audio. In: Proc. of SPIE Multimedia Computing and Networking, San Jose, CA (1998) to appear

    Google Scholar 

  2. Dagan, I., Pereira, F., and Lee, L.: Similarity-Based Estimation of Word Cooccurrence Probabilities. In: Proc. of the 32nd Annual Meeting of the Assoc. for Computational Linguistics, ACL’94, New Mexico State University, Las Cruces, NM (1994)

    Google Scholar 

  3. Chen, T., Graf, H.P., and Wang, K.: Lip Synchronization Using Speech-Assisted Video Processing. IEEE Signal Proc. Letters 2 (1995) 57–59

    Article  Google Scholar 

  4. Bloom, P.J.: High-Quality Digital Audio in the Entertainment Industry: An Overview of Achievements and Challenges. IEEE ASSP Magazine 2 (1995) 2–25

    Article  Google Scholar 

  5. Brown, M.G., Foote, J.T., Jones, G.J.F., Spärck Jones, K., and Young, S.J.: Video Mail Retrieval by Voice: An Overview of the Cambridge/Olivetti Retrieval System. In: Proc. of the ACM Multimedia’ 94 Workshop on Multimedia Database Management Systems, San Francisco, CA (1994) 47–55

    Google Scholar 

  6. Ballerini, J.-P., Büchel, M., Domenig, R., Knaus, D., Mateev, B., Mittendorf, E., Schäuble, P., Sheridan, P., and Wechsler, M.: SPIDER Retrieval System at TREC-5. In: Proc. of TREC-5 (1996)

    Google Scholar 

  7. Hauptmann, A.G., Witbrock, M.J., Rudnicky, A.I., and Reed, S.: Speech for Multimedia Information Retrieval. In: Proc. of User Interface Software and Technology UIST-95, Pittsburg, PA (1995)

    Google Scholar 

  8. Gibbs, S., Breiteneder, C., and Tsichritzis, D.: Modeling Time-Based Media. The Handbook of Multimedia Information Management. Prentice Hall PTR (1997) 13–38.

    Google Scholar 

  9. Bonhomme, P., and Romary, L.: The Lingua Parallel Concordancing Project: Managing Multilingual Texts for Educational Purposes. In: Proc. of Language Engineering 95, Montpellier, France (1995)

    Google Scholar 

  10. Church, K.W.: Char_Align: A Program for Aligning Parallel Texts at the Character Level. In: Proc. of the 30th Annual Meeting of the Assoc. for Computational Linguistics, ACL’93, Columbus, OH (1993)

    Google Scholar 

  11. Makedon, F., Owen,, M., and Owen, C.: Multimedia-Data Access Remote Prototype for Ancient Texts. In: Proc. of ED-MEDIA 98, Freiburg, Germany (1998)

    Google Scholar 

  12. Owen, C.B.: Multiple Media Correlation: Theory and Applications. Ph.D. thesis, Dartmouth College Dept. of Computer Science (1998)

    Google Scholar 

  13. Melamed, I.D.: A Portable Algorithm for Mapping Bitext Correspondence. In: Proc. of the 35th Conference of the Assoc. for Computational Linguistics, ACL’97, Madrid, Spain (1997)

    Google Scholar 

  14. Rigau, G., and Agirre, E.: Disambiguating Bilingual Nominal Entries Against WordNet. In: Proc. of the Workshop on the Computational Lexicon, ESSLLI’95 (1995)

    Google Scholar 

  15. Fung, P., and McKeown, K.: Aligning Noisy Parallel Corpora Across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Proc. of the 1st Conf. of the Assoc. for Machine Translation in the Americas, AMTA-94, Columbia, Maryland (1994)

    Google Scholar 

  16. Kabir, A.S.: Identifying And Encoding Correlations Across Multiple Documents. DEVLAB Research Report, Dartmouth College (1997)

    Google Scholar 

  17. Fung, P., and Church, K.W.: K-vec: A New Approach for Aligning Parallel Texts. In: Proc. of the 15th Int. Conf. on Computational Linguistics COLING’94„ Kyoto, Japan, (1994) 1096–1102

    Google Scholar 

  18. Homer: The Odyssey. Translated by Samuel Butler.

    Google Scholar 

  19. Homer: The Odyssey. Translated by George Chapman.

    Google Scholar 

  20. Melamed, I.D.: A Geometric Approach to Mapping Bitext Correspondence. Report 96-22, IRCS (1996)

    Google Scholar 

  21. van der Eijk, P.: Comparative Discourse Analysis of Parallel Texts. Unpublished manuscript (1994)

    Google Scholar 

  22. Salton, G.: Introduction to Modern Information Retrieval. McGraw-Hill Computer Science Series, New York (1982)

    Google Scholar 

  23. Richard Beckwith, George A. Miller, and Randee Tengi. Design and Implementation of the Wordnet Lexical Database and Searching Software. Report, Princeton University Cognitive Science Laboratory (1993)

    Google Scholar 

  24. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K.: Introduction to WordNet: An On-line Lexical Database (revised). CSL Report 43, Princeton University Cognitive Science Laboratory (1993)

    Google Scholar 

  25. Cormen, T.H., Leiserson, C.E., and Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge, MA (1990)

    MATH  Google Scholar 

  26. Owen, C.B.: The Imagetcl Multimedia Algorithm Development System. In: Proc. of the 5th Annual Tcl/Tk Workshop’97, Boston, MA (1997) 97–105

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Owen, C.B., Ford, J., Makedon, F., Steinberg, T. (1998). Parallel Text Alignment. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-49653-X_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65101-7

  • Online ISBN: 978-3-540-49653-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics