skip to main content
research-article

Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites

Published: 18 October 2021 Publication History

Abstract

Collaborative editing questions and answers plays an important role in quality control of Mathematics StackExchange which is a math Q&A Site. Our study of post edits in Mathematics Stack Exchange shows that there is a large number of math-related edits about latexifying formulas, revising LaTeX and converting the blurred math formula screenshots to LaTeX sequence. Despite its importance, manually editing one math-related post especially those with complex mathematical formulas is time-consuming and error-prone even for experienced users. To assist post owners and editors to do this editing, we have developed an edit-assistance tool, MathLatexEdit for formula latexification, LaTeX revision and screenshot transcription. We formulate this formula editing task as a translation problem, in which an original post is translated to a revised post. MathLatexEdit implements a deep learning based approach including two encoder-decoder models for textual and visual LaTeX edit recommendation with math-specific inference. The two models are trained on large-scale historical original-edited post pairs and synthesized screenshot-formula pairs. Our evaluation of MathLatexEdit not only demonstrates the accuracy of our model, but also the usefulness of MathLatexEdit in editing real-world posts which are accepted in Mathematics Stack Exchange.

References

[1]
2020. About wikiHow. https://www.wikihow.com/wikiHow:About-wikiHow. Accessed May 26, 2020.
[2]
2020. How do I write a good answer? https://math.stackexchange.com/help/how-to-answer. Accessed May 26, 2020.
[3]
2020. How to Ask. https://math.stackexchange.com/questions/ask/advice. Accessed May 26, 2020.
[4]
2020. MathJax basic tutorial and quick reference. https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference. Accessed Oct 16, 2020.
[5]
2020. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. Accessed May 25, 2020.
[6]
2020. Stack Exchange. https://stackexchange.com/sites#. Accessed May 28, 2020.
[7]
2020. Welcome to Mathematics Stack Exchange. https://math.stackexchange.com/tour. Accessed May 26, 2020.
[8]
2020. Why can people edit my posts? How does editing work? https://math.stackexchange.com/help/editing. Accessed May 26, 2020.
[9]
Nancy Alajarmeh. 2012. Doing math: mathematics accessibility issues. In Proceedings of the International Cross-Disciplinary Conference on web accessibility (W4A '12). ACM, 1--2.
[10]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.
[11]
Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1273--1285.
[12]
Davide Cervone. 2012. MathJax: A Platform for Mathematics on the Web. Notices of the American Mathematical Society 59, 02 (2012), 1.
[13]
Chunyang Chen, Xi Chen, Jiamou Sun, Zhenchang Xing, and Guoqiang Li. 2018. Data-driven proactive policy assurance of post quality in community q&a sites. Proceedings of the ACM on human-computer interaction 2, CSCW (2018), 1--22.
[14]
Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation. In Proceedings of the 40th International Conference on Software Engineering. 665--676.
[15]
Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1--10.
[16]
Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 356--366.
[17]
Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--21.
[18]
Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 450--461.
[19]
Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhut, Guoqiang Li, and Jinshui Wang. 2020. Unblind your apps: Predicting natural-language labels for mobile gui components by deep learning. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 322--334.
[20]
Xiang Chen, Chunyang Chen, Dun Zhang, and Zhenchang Xing. 2019. Sethesaurus: Wordnet in software engineering. IEEE Transactions on Software Engineering (2019).
[21]
Joohee Choi and Yla Tausczik. 2018. Will Too Many Editors Spoil The Tag? Conflicts and Alignment in Q&A Categorization. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--19.
[22]
Shamil Chollampatt and Hwee Tou Ng. 2018. Neural Quality Estimation of Grammatical Error Correction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2528--2539. https://doi.org/10.18653/v1/D18--1274
[23]
Marcel. author Danesi. 2016. Learning and Teaching Mathematics in The Global Village Math Education in the Digital Age (1st ed. 2016. ed.).
[24]
Yuntian Deng, Anssi Kanervisto, and Alexander M Rush. 2016. What you get is what you see: A visual markup decompiler. arXiv preprint arXiv:1609.04938 10 (2016), 32--37.
[25]
Michael P Fay and Michael A Proschan. 2010. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys 4 (2010), 1-39. https://doi.org/10.1214/09-ss051
[26]
Denae Ford, Kristina Lustig, Jeremy Banks, and Chris Parnin. 2018. " We Don't Do That Here" How Collaborative Editing with Mentors Improves Engagement in Social Q&A Communities. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1--12.
[27]
Sa Gao, Chunyang Chen, Zhenchang Xing, Yukun Ma, Wen Song, and Shang-Wei Lin. 2019. A neural model for method name generation from functional description. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 414--421.
[28]
Utpal Garain, BB Chaudhuri, and Adrish Ray Chaudhuri. 2004. Identification of embedded mathematical expressions in scanned documents. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 1. IEEE, 384--387.
[29]
Roman Grundkiewicz and Marcin Junczys-Dowmunt. 2018. Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 284--290. https://doi.org/10.18653/v1/N18--2046
[30]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[31]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[32]
Karen Embry Jenlink. 2006. Math Education., 647--651 pages.
[33]
Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. arXiv preprint arXiv:1605.06353 (2016).
[34]
Jean Lave and Etienne Wenger. 1999. Legitimate peripheral participation. Learners, learning and assessment, London: The Open University (1999), 83--89.
[35]
Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.
[36]
Guo Li, Tun Lu, Xianghua Ding, and Ning Gu. 2016. Predicting Collaborative Edits of Questions and Answers in Online Q&A Sites. Journal of Internet Technology 17 (2016), 1187--1194.
[37]
Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is it good to be like Wikipedia? Exploring the trade-offs of introducing collaborative editing model to Q&A sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 1080--1091.
[38]
Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).
[39]
Suyu Ma, Zhenchang Xing, Chunyang Chen, Cheng Chen, Lizhen Qu, and Guoqiang Li. 2019. Easy-to-deploy API extraction by multi-level feature embedding and transfer learning. IEEE Transactions on Software Engineering (2019).
[40]
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI Conference on human factors in computing systems (CHI '11). ACM, 2857--2866.
[41]
Jessica Middendorf. 2018. Increasing Retention through Math Study Skills. http://search.proquest.com/docview/2061549897/
[42]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[43]
Tomoya Mizumoto and Yuji Matsumoto. 2016. Discriminative reranking for grammatical error correction with statistical machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1133--1138.
[44]
Leydi Viviana Montoya, Athen Ma, and Raúl J Mondragón. 2013. Social achievement and centrality in MathOverflow. In Complex Networks IV. Springer, 27--38.
[45]
Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. GLEU without tuning. arXiv preprint arXiv:1605.02592 (2016).
[46]
Daniel Ortiz-Mart?nez, Ismael Garc?a-Varea, and Francisco Casacuberta. 2005. Thot: a toolkit to train phrase-based statistical translation models. Tenth Machine Translation Summit. AAMT, Phuket, Thailand, September (2005).
[47]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://www.aclweb.org/anthology/P02--1040
[48]
Masakazu Suzuki, Fumikazu Tamari, Ryoji Fukuda, Seiichi Uchida, and Toshihiro Kanahori. 2003. INFTY: an integrated OCR system for mathematical documents. In Proceedings of the 2003 ACM symposium on document engineering (DocEng '03). ACM, 95--104.
[49]
Yla R Tausczik, Aniket Kittur, and Robert E Kraut. 2014. Collaborative problem solving: A study of mathoverflow. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 355--367.
[50]
Hashim M Twaakyondo and Masayuki Okamoto. 1995. Structure analysis and recognition of mathematical expressions. In Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1. IEEE, 430--437.
[51]
Andrew W Vargo and Shigeo Matsubara. 2016. Editing Unfit Questions in Q&A. In 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE, 107--112.
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[53]
Jian Wang, Yunchuan Sun, and Shenling Wang. 2019. Image To Latex with DenseNet Encoder and Joint Attention. Procedia computer science 147 (2019), 374--380.
[54]
Xu Wang, Chunyang Chen, and Zhenchang Xing. 2019. Domain-specific machine translation with recurrent neural network for software localization. Empirical Software Engineering 24, 6 (2019), 3514--3545.
[55]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.
[56]
Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--386.
[57]
Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016. Candidate re-ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. 256--266.

Cited By

View all
  • (2024)Tex2Py-45K: A Parallel Corpus Dataset for Bidirectional Conversion Between LaTeX and PythonProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709044(197-203)Online publication date: 6-Dec-2024
  • (2024)TS2ACTProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314457:4(1-22)Online publication date: 12-Jan-2024
  • (2023)Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative ExplorationApplied Sciences10.3390/app13221250313:22(12503)Online publication date: 20-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 5, Issue CSCW2
CSCW2
October 2021
5376 pages
EISSN:2573-0142
DOI:10.1145/3493286
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2021
Published in PACMHCI Volume 5, Issue CSCW2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Q&A sites
  2. collaborative editing
  3. deep learning
  4. latex
  5. math

Qualifiers

  • Research-article

Funding Sources

  • ARC Laureate Fellowship

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Tex2Py-45K: A Parallel Corpus Dataset for Bidirectional Conversion Between LaTeX and PythonProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709044(197-203)Online publication date: 6-Dec-2024
  • (2024)TS2ACTProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314457:4(1-22)Online publication date: 12-Jan-2024
  • (2023)Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative ExplorationApplied Sciences10.3390/app13221250313:22(12503)Online publication date: 20-Nov-2023
  • (2023)MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few LabelsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625782(1-14)Online publication date: 12-Nov-2023
  • (2023)A First Look at Dark Mode in Real-world Android AppsACM Transactions on Software Engineering and Methodology10.1145/360460733:1(1-26)Online publication date: 23-Nov-2023
  • (2023)FFL: A Language and Live Runtime for Styling and Labeling Typeset Math FormulasProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606731(1-16)Online publication date: 29-Oct-2023
  • (2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
  • (2022)Towards a Dynamic Inter-Sensor Correlations Learning Framework for Multi-Sensor-Based Wearable Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35503316:3(1-25)Online publication date: 7-Sep-2022
  • (2022)Clustering of Human Activities from Wearables by Adopting Nearest NeighborsProceedings of the 2022 ACM International Symposium on Wearable Computers10.1145/3544794.3558477(1-5)Online publication date: 11-Sep-2022
  • (2022)Stepping Into the Next Decade of Ubiquitous and Pervasive Computing: UbiComp and ISWC 2021IEEE Pervasive Computing10.1109/MPRV.2022.316006321:2(87-99)Online publication date: 1-Apr-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media