Abstract
The lack of specific data sets makes difficult the discourse parsing for Informal Mathematical Discourse (IMD). In this paper, we propose a data driven approach to identify arguments and connectives in an IMD structure within the context of Controlled Natural Language (CNL). Our approach follows a low-level discourse parsing under Peen Discourse TreeBank (PDTB) guidelines. Three classifiers have been trained: one that identifies the Arg2, other that locates the relative position of Arg1 and a third that identifies the (Arg1 and Arg2) arguments of each connective. These classifiers are instances of Support Vector Machines (SVMs), fed from an own Mathematical TreeBank. Finally, our approach defines an End-to-End discourse parser into IMD, whose results will be used to classify of informal deductive proofs via the low level discourse in IMD processing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asher, N.: Reference to Abstract Objects in Discourse. Kluwer Academic Publishers (1993)
Bikel, D.: Design of a Multilingual, Parallel Processing Statistical Parsing Engine. In: 2nd International Conference on Human Language Technology Research HLT 2002, pp. 178–182. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Cramer, M., Fisseni, B., Koepke, P., Kühlwein, D., Schröder, B., Veldman, J.: The Naproche Project Controlled Natural Language Proof Checking of Mathematical Texts. In: Fuchs, N.E. (ed.) CNL 2009. LNCS, vol. 5972, pp. 170–186. Springer, Heidelberg (2010)
Dinesh, N., Lee, A., Miltsakaki, E., Prasad, R., Joshi, A., Webber, B.: Attribution and the (Non-)Alignment of Syntactic and Discourse Arguments of Connectives. In: CorpusAnno 2005 Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pp. 29–36. Association for Computational Linguistics (ACM), Stroudsburg (2005)
Elwell, R., Baldridge, J.: Discourse Connective Argument Identification with Connective Specific Rankers. In: ICSC 2008 Proceedings of the 2008 IEEE International Conference on Semantic Computing, pp. 198–205. IEEE Computer Society, Washington (2008)
Fawcet, T.: An Introduction to ROC Analysis. Pattern Recognition Letters- Specialissue: ROC Analysis in Pattern, 861–874 (2006)
Ghosh, S., Johansson, R., Riccardi, G., Tonelli, S.: Shallow Discourse Parsing with Conditional Random Fields. In: 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 1071–1079 (2011)
Gutiérrez de Piñerez, R.E., Díaz, J.F.: Preprocessing of Informal Mathematical Discourse in Context of Controlled Natural Language. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012). Association for Computing Machinery, ACM (2012)
Humayoun, M., Raffalli, C.: MathAbs: A Representational Language for Mathematics. In: Proceedings of 8th International Conference on Frontiers of Information Technology, Islamabad, Pakistan, p. 37 (2010)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schlkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1999)
Kamareddine, F., Maarek, M., Retel, K., Wells, J.B.: Narrative Structure of Mathematical Texts. In: Kauers, M., Kerber, M., Miner, R., Windsteiger, W. (eds.) MKM/CALCULEMUS 2007. LNCS (LNAI), vol. 4573, pp. 296–312. Springer, Heidelberg (2007)
Lin, Z., Ng, H.T., Kan, M.: A PDTB-Styled End-to-End Discourse Parser. The Computing Research Repository 1011 (2011)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Marcus, M., Santorini, B., Ann Marcinkiewicz, A.: Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse TreeBank 2.0. In: Proceedings of the 6th International Conference on Languages Resources and Evaluations (LREC 2008), Marrakech, Marocco (2008)
Ruesga, S.L., Sandoval, S.L., Len, L.F.: Spanish Treebank: Specifications version 5. Universidad Autnoma de Madrid (1999)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Wellner, B., Pustejovsky, J.: Automatically Identifying the Arguments of Discourse Connectives. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 92–101. Association for Computational Linguistics, Prague (2007)
Wolska, M., Vo, B.Q., Tsovaltzi, D., Kruijff-Korbayov, I., Karagjosova, E., Horacek, H., Fiedler, A., Benzmller, C.: Annotated Corpus of Tutorial Dialogs on Mathematical Theorem Proving. In: Proceedings of 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, pp. 1007–1010 (2004)
Wolska, M.: A Language Engineering Architecture for Processing Informal Mathematical Discourse. In: Towards Digital Mathematics Library, Birmingham, United Kingdom, pp. 131–136. Masaryk University (2008)
Zinn, C.: Understanding Informal Mathematical Discourse. Ph.D. thesis. Universitat Erlangen-Nürnberg Institut für Informatik (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Piñerez Reyes, R.E.G., Frias, J.F.D. (2013). Building a Discourse Parser for Informal Mathematical Discourse in the Context of a Controlled Natural Language. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2013. Lecture Notes in Computer Science, vol 7816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37247-6_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-37247-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37246-9
Online ISBN: 978-3-642-37247-6
eBook Packages: Computer ScienceComputer Science (R0)