Skip to main content

MaxTract: Converting PDF to \(\mbox\LaTeX\), MathML and Text

  • Conference paper
Book cover Intelligent Computer Mathematics (CICM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7362))

Included in the following conference series:

Abstract

In this paper we present the first public, online demonstration of MaxTract; a tool that converts PDF files containing mathematics into multiple formats including \(\mbox\LaTeX\), HTML with embedded MathML, and plain text. Using a bespoke PDF parser and image analyser, we directly extract character and font information to use as input for a linear grammar which, in conjunction with specialised drivers, can accurately recognise and reproduce both the two dimensional relationships between symbols in mathematical formulae and the one dimensional relationships present in standard text.

The main goals of MaxTract are to provide translation services into standard mathematical markup languages and to add accessibility to mathematical documents on multiple levels. This includes both accessibility in the narrow sense of providing access to content for print impaired users, such as those with visual impairments, dyslexia or dyspraxia, as well as more generally to enable any user access to the mathematical content at more re-usable levels than merely visual. MaxTract produces output compatible with web browsers, screen readers, and tools such as copy and paste, which is achieved by enriching the regular text with mathematical markup. The output can also be used directly, within the limits of the presentation MathML produced, as machine readable mathematical input to software systems such as Mathematica or Maple.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adobe. PDF Reference fifth edition Adobe Portable Document Format Version 1.6. Adobe Systems (2004)

    Google Scholar 

  2. Adobe. Adobe Reader X. Adobe Systems (2012), http://get.adobe.com/uk/reader/

  3. Baker, J.B., Sexton, A.P., Sorge, V.: A Linear Grammar Approach to Mathematical Formula Recognition from PDF. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) MKM 2009, Held as Part of CICM 2009. LNCS (LNAI), vol. 5625, pp. 201–216. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Baker, J.B., Sexton, A.P., Sorge, V.: Towards reverse engineering of PDF documents. In: Sojka, P., Bouche, T. (eds.) Towards a Digital Mathematics Library, DML 2011, Bertinoro, Italy, pp. 65–75. Masaryk University Press (July 2011)

    Google Scholar 

  5. Black, A.W., Taylor, P.A.: The Festival Speech Synthesis System: System documentation. Technical Report HCRC/TR-83, Human Communciation Research Centre, University of Edinburgh, Scotland, UK (1997), http://www.cstr.ed.ac.uk/projects/festival.html

  6. Gray, N.: Textpos (2010), http://purl.org/nxg/dist/textpos

  7. Marik, R.: OCGtools (2012), http://ctan.org/pkg/ocgtools

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baker, J.B., Sexton, A.P., Sorge, V. (2012). MaxTract: Converting PDF to \(\mbox\LaTeX\), MathML and Text. In: Jeuring, J., et al. Intelligent Computer Mathematics. CICM 2012. Lecture Notes in Computer Science(), vol 7362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31374-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31374-5_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31373-8

  • Online ISBN: 978-3-642-31374-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics