Abstract
Content analysis of scientific publications is a nontrivial task, but a useful and important one for scientific information services. In the Gutenberg era it was a domain of human experts; in the digital age many machine-based methods, e.g., graph analysis tools and machine-learning techniques, have been developed for it. Natural Language Processing (NLP) is a powerful machine-learning approach to semiautomatic speech and language processing, which is also applicable to mathematics. The well established methods of NLP have to be adjusted for the special needs of mathematics, in particular for handling mathematical formulae. We demonstrate a mathematics-aware part of speech tagger and give a short overview about our adaptation of NLP methods for mathematical publications. We show the use of the tools developed for key phrase extraction and classification in the database zbMATH.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The database zbMATH, http://www.zentralblatt-math.org/zbmath/
Mathematics Subject Classification (MSC 2010), http://www.msc2010.org
Santorini, B.: Part-of-Speech-Tagging guidelines for the Penn Treebank Project (3rd Revision, 2nd printing) (June 1990), ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz
Schöneberg, U., Sperber, W.: The DeLiVerMATH project: Text analysis in mathematics. In: Carette, J., Aspinall, D., Lange, C., Sojka, P., Windsteiger, W. (eds.) CICM 2013. LNCS (LNAI), vol. 7961, pp. 379–382. Springer, Heidelberg (2013), http://arxiv.org/pdf/1306.6944.pdf
Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Wikipedia contributors, ‘Index term’, Wikipedia, The Free Encyclopedia (January 13, 2014), http://en.wikipedia.org/wiki/Index_term
Platt, J.C.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. MIT Press, Cambridge (1999)
Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 246–253 (1997)
Encyclopedia of Mathematics, http://www.encyclopediaofmath.org/index.php/Main_Page
PlanetMath, http://planetmath.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Schöneberg, U., Sperber, W. (2014). POS Tagging and Its Applications for Mathematics. In: Watt, S.M., Davenport, J.H., Sexton, A.P., Sojka, P., Urban, J. (eds) Intelligent Computer Mathematics. CICM 2014. Lecture Notes in Computer Science(), vol 8543. Springer, Cham. https://doi.org/10.1007/978-3-319-08434-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-08434-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08433-6
Online ISBN: 978-3-319-08434-3
eBook Packages: Computer ScienceComputer Science (R0)