Abstract
Abundant Chinese paraphrasing resource on Internet can be attained from different Chinese translations of one foreign masterpiece. Paraphrases corpus is the corpus that includes sentence pairs to convey the same information. The irregular characteristics of the real monolingual parallel texts, especially without the strictly aligned paragraph boundaries between two translations, bring a challenge to alignment technology. The traditional alignment methods on bilingual texts have some difficulties in competency for doing this. A new method for aligning real monolingual parallel texts using sentence pair’s length and location information is described in this paper. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. And presently, a paraphrases corpus with about fifty thousand sentence pairs is constructed.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: Meeting of the Association for Computational Linguistics, pp. 50–57 (2001)
Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 1 (2001)
Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., Mollá, D.: Exploiting paraphrases in a question answering system. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 25–32 (2003)
France, F.D.: Learning paraphrases to improve a question-answering system. EACL-Natural Language Processing for Question Answering (2003)
Tomuro, N.: Interrogative reformulation patterns and acquisition of question paraphrases. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 33–40 (2003)
Takahashi, T., Nawata, K., Kouda, S., Inui, K., Matsumoto, Y.: Effects of structural matching and paraphrasing in question answering. IEICE Transactions on Information and Syste (2003)
Shinyama, Y., Sekine, S.: Paraphrase acquisition for information extraction. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 65–71 (2003)
Kanayama, H.: Paraphrasing rules for automatic evaluation of translation into Japanese. In: Inui, K., Hermjakob, U. (eds.) Proceedings of the Second International Workshop on Paraphrasing, pp. 88–93 (2003)
Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), Proceedings, Maryland, pp. 341–348 (1999)
Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research 17, 35–55 (2002)
Shinyama, Y., Sekine, S., Sudo, K., Grishman, R.: Automatic paraphrase acquisition from news articles (2002)
Brown, P.F., Lai, J.C., Mercer, R.L.: Aligning sentences in parallel corpora. In: Meeting of the Association for Computational Linguistics, pp. 169–176 (1991)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1993)
Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proc. of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation: Empiricist vs. Rationalist Methods in MT, Montreal, Canada, pp. 67–81 (1992)
Wu, D.: Aligning a parallel english-chinese corpus statistically with lexical criteria. In: Meeting of the Association for Computational Linguistics, pp. 80–87 (1994)
Church, K.W.: Char_align: A program for aligning parallel texts at the character level. In: ACL 1993, pp. 1–8 (1993)
Chen, S.F.: Aligning sentences in bilingual corpora using lexical information. In: Meeting of the Association for Computational Linguistics, pp. 9–16 (1993)
Pascale, F., Mckeown, K.: Aligning noisy parallel corpora across language groups: Word pair feature matching by dynamic time warping (1994)
Bin, W., Qin, L., Xiang, Z.: Automatic chinese-english paragraph segmentation and alignment. Journal of Software 11, 1547–1553 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, W., Liu, T., Li, S. (2005). Combining Sentence Length with Location Information to Align Monolingual Parallel Texts. In: Myaeng, S.H., Zhou, M., Wong, KF., Zhang, HJ. (eds) Information Retrieval Technology. AIRS 2004. Lecture Notes in Computer Science, vol 3411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31871-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-31871-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25065-4
Online ISBN: 978-3-540-31871-2
eBook Packages: Computer ScienceComputer Science (R0)