research-article

Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites

Authors:

Hourieh Khalajzadeh,

John GrundyAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 5, Issue CSCW2

Article No.: 403, Pages 1 - 24

https://doi.org/10.1145/3479547

Published: 18 October 2021 Publication History

Abstract

Collaborative editing questions and answers plays an important role in quality control of Mathematics StackExchange which is a math Q&A Site. Our study of post edits in Mathematics Stack Exchange shows that there is a large number of math-related edits about latexifying formulas, revising LaTeX and converting the blurred math formula screenshots to LaTeX sequence. Despite its importance, manually editing one math-related post especially those with complex mathematical formulas is time-consuming and error-prone even for experienced users. To assist post owners and editors to do this editing, we have developed an edit-assistance tool, MathLatexEdit for formula latexification, LaTeX revision and screenshot transcription. We formulate this formula editing task as a translation problem, in which an original post is translated to a revised post. MathLatexEdit implements a deep learning based approach including two encoder-decoder models for textual and visual LaTeX edit recommendation with math-specific inference. The two models are trained on large-scale historical original-edited post pairs and synthesized screenshot-formula pairs. Our evaluation of MathLatexEdit not only demonstrates the accuracy of our model, but also the usefulness of MathLatexEdit in editing real-world posts which are accepted in Mathematics Stack Exchange.

References

[1]

2020. About wikiHow. https://www.wikihow.com/wikiHow:About-wikiHow. Accessed May 26, 2020.

[2]

2020. How do I write a good answer? https://math.stackexchange.com/help/how-to-answer. Accessed May 26, 2020.

[3]

2020. How to Ask. https://math.stackexchange.com/questions/ask/advice. Accessed May 26, 2020.

[4]

2020. MathJax basic tutorial and quick reference. https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference. Accessed Oct 16, 2020.

[5]

2020. The Objective Revision Evaluation Service. https://ores.wikimedia.org/. Accessed May 25, 2020.

[6]

2020. Stack Exchange. https://stackexchange.com/sites#. Accessed May 28, 2020.

[7]

2020. Welcome to Mathematics Stack Exchange. https://math.stackexchange.com/tour. Accessed May 26, 2020.

[8]

2020. Why can people edit my posts? How does editing work? https://math.stackexchange.com/help/editing. Accessed May 26, 2020.

[9]

Nancy Alajarmeh. 2012. Doing math: mathematics accessibility issues. In Proceedings of the International Cross-Disciplinary Conference on web accessibility (W4A '12). ACM, 1--2.

Digital Library

[10]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993--1022.

Digital Library

[11]

Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, and Xiang Chen. 2021. Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1273--1285.

Digital Library

[12]

Davide Cervone. 2012. MathJax: A Platform for Mathematics on the Web. Notices of the American Mathematical Society 59, 02 (2012), 1.

[13]

Chunyang Chen, Xi Chen, Jiamou Sun, Zhenchang Xing, and Guoqiang Li. 2018. Data-driven proactive policy assurance of post quality in community q&a sites. Proceedings of the ACM on human-computer interaction 2, CSCW (2018), 1--22.

Digital Library

[14]

Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation. In Proceedings of the 40th International Conference on Software Engineering. 665--676.

Digital Library

[15]

Chunyang Chen and Zhenchang Xing. 2016. Mining technology landscape from stack overflow. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1--10.

Digital Library

[16]

Chunyang Chen, Zhenchang Xing, and Lei Han. 2016. Techland: Assisting technology landscape inquiries with insights from stack overflow. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 356--366.

[17]

Chunyang Chen, Zhenchang Xing, and Yang Liu. 2017. By the community & for the community: a deep learning approach to assist collaborative editing in q&a sites. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1--21.

Digital Library

[18]

Chunyang Chen, Zhenchang Xing, and Ximing Wang. 2017. Unsupervised software-specific morphological forms inference from informal discussions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 450--461.

Digital Library

[19]

Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhut, Guoqiang Li, and Jinshui Wang. 2020. Unblind your apps: Predicting natural-language labels for mobile gui components by deep learning. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 322--334.

Digital Library

[20]

Xiang Chen, Chunyang Chen, Dun Zhang, and Zhenchang Xing. 2019. Sethesaurus: Wordnet in software engineering. IEEE Transactions on Software Engineering (2019).

[21]

Joohee Choi and Yla Tausczik. 2018. Will Too Many Editors Spoil The Tag? Conflicts and Alignment in Q&A Categorization. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1--19.

Digital Library

[22]

Shamil Chollampatt and Hwee Tou Ng. 2018. Neural Quality Estimation of Grammatical Error Correction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2528--2539. https://doi.org/10.18653/v1/D18--1274

[23]

Marcel. author Danesi. 2016. Learning and Teaching Mathematics in The Global Village Math Education in the Digital Age (1st ed. 2016. ed.).

[24]

Yuntian Deng, Anssi Kanervisto, and Alexander M Rush. 2016. What you get is what you see: A visual markup decompiler. arXiv preprint arXiv:1609.04938 10 (2016), 32--37.

[25]

Michael P Fay and Michael A Proschan. 2010. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys 4 (2010), 1-39. https://doi.org/10.1214/09-ss051

[26]

Denae Ford, Kristina Lustig, Jeremy Banks, and Chris Parnin. 2018. " We Don't Do That Here" How Collaborative Editing with Mentors Improves Engagement in Social Q&A Communities. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1--12.

Digital Library

[27]

Sa Gao, Chunyang Chen, Zhenchang Xing, Yukun Ma, Wen Song, and Shang-Wei Lin. 2019. A neural model for method name generation from functional description. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 414--421.

[28]

Utpal Garain, BB Chaudhuri, and Adrish Ray Chaudhuri. 2004. Identification of embedded mathematical expressions in scanned documents. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., Vol. 1. IEEE, 384--387.

[29]

Roman Grundkiewicz and Marcin Junczys-Dowmunt. 2018. Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 284--290. https://doi.org/10.18653/v1/N18--2046

[30]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[31]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[32]

Karen Embry Jenlink. 2006. Math Education., 647--651 pages.

[33]

Marcin Junczys-Dowmunt and Roman Grundkiewicz. 2016. Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. arXiv preprint arXiv:1605.06353 (2016).

[34]

Jean Lave and Etienne Wenger. 1999. Legitimate peripheral participation. Learners, learning and assessment, London: The Open University (1999), 83--89.

[35]

Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10. 707--710.

[36]

Guo Li, Tun Lu, Xianghua Ding, and Ning Gu. 2016. Predicting Collaborative Edits of Questions and Answers in Online Q&A Sites. Journal of Internet Technology 17 (2016), 1187--1194.

[37]

Guo Li, Haiyi Zhu, Tun Lu, Xianghua Ding, and Ning Gu. 2015. Is it good to be like Wikipedia? Exploring the trade-offs of introducing collaborative editing model to Q&A sites. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 1080--1091.

[38]

Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).

[39]

Suyu Ma, Zhenchang Xing, Chunyang Chen, Cheng Chen, Lizhen Qu, and Guoqiang Li. 2019. Easy-to-deploy API extraction by multi-level feature embedding and transfer learning. IEEE Transactions on Software Engineering (2019).

[40]

Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI Conference on human factors in computing systems (CHI '11). ACM, 2857--2866.

Digital Library

[41]

Jessica Middendorf. 2018. Increasing Retention through Math Study Skills. http://search.proquest.com/docview/2061549897/

[42]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.

[43]

Tomoya Mizumoto and Yuji Matsumoto. 2016. Discriminative reranking for grammatical error correction with statistical machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1133--1138.

[44]

Leydi Viviana Montoya, Athen Ma, and Raúl J Mondragón. 2013. Social achievement and centrality in MathOverflow. In Complex Networks IV. Springer, 27--38.

[45]

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. GLEU without tuning. arXiv preprint arXiv:1605.02592 (2016).

[46]

Daniel Ortiz-Mart?nez, Ismael Garc?a-Varea, and Francisco Casacuberta. 2005. Thot: a toolkit to train phrase-based statistical translation models. Tenth Machine Translation Summit. AAMT, Phuket, Thailand, September (2005).

[47]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://www.aclweb.org/anthology/P02--1040

Digital Library

[48]

Masakazu Suzuki, Fumikazu Tamari, Ryoji Fukuda, Seiichi Uchida, and Toshihiro Kanahori. 2003. INFTY: an integrated OCR system for mathematical documents. In Proceedings of the 2003 ACM symposium on document engineering (DocEng '03). ACM, 95--104.

Digital Library

[49]

Yla R Tausczik, Aniket Kittur, and Robert E Kraut. 2014. Collaborative problem solving: A study of mathoverflow. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 355--367.

Digital Library

[50]

Hashim M Twaakyondo and Masayuki Okamoto. 1995. Structure analysis and recognition of mathematical expressions. In Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1. IEEE, 430--437.

[51]

Andrew W Vargo and Shigeo Matsubara. 2016. Editing Unfit Questions in Q&A. In 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE, 107--112.

[52]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[53]

Jian Wang, Yunchuan Sun, and Shenling Wang. 2019. Image To Latex with DenseNet Encoder and Joint Attention. Procedia computer science 147 (2019), 374--380.

[54]

Xu Wang, Chunyang Chen, and Zhenchang Xing. 2019. Domain-specific machine translation with recurrent neural network for software localization. Empirical Software Engineering 24, 6 (2019), 3514--3545.

[55]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.

Digital Library

[56]

Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--386.

[57]

Zheng Yuan, Ted Briscoe, and Mariano Felice. 2016. Candidate re-ranking for SMT-based grammatical error correction. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications. 256--266.

Cited By

Li LDuan MWang Y(2024)Tex2Py-45K: A Parallel Corpus Dataset for Bidirectional Conversion Between LaTeX and PythonProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709044(197-203)Online publication date: 6-Dec-2024
https://dl.acm.org/doi/10.1145/3709026.3709044
Xia KLi WGan SLu S(2024)TS2ACTProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314457:4(1-22)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631445
Orji EHaydar AErşan İMwambe O(2023)Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative ExplorationApplied Sciences10.3390/app13221250313:22(12503)Online publication date: 20-Nov-2023
https://doi.org/10.3390/app132212503
Show More Cited By

Index Terms

Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites
1. Applied computing
  1. Document management and text processing
    1. Document management
      1. Text editing
2. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing systems and tools

Recommendations

Data-Driven Proactive Policy Assurance of Post Quality in Community q&a Sites

To ensure the post quality, Q&A sites usually develop a list of quality assurance guidelines for "dos and don'ts", and adopt collaborative editing mechanism to fix quality violations. Quality guidelines are mostly high-level principles, and many tacit ...
By the Community & For the Community: A Deep Learning Approach to Assist Collaborative Editing in Q&A Sites

Community edits to questions and answers (called post edits) plays an important role in improving content quality in Stack Overflow. Our study of post edits in Stack Overflow shows that a large number of edits are about formatting, grammar and spelling. ...
Third annual collaborative editing workshop
GROUP '01: Proceedings of the 2001 ACM International Conference on Supporting Group Work

Collaborative editing systems support groups of people editing a document together over the computer network. People may work simultaneously on the same document, simultaneously on different copies of the document, or at different times on the original ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 5, Issue CSCW2

CSCW2

October 2021

5376 pages

EISSN:2573-0142

DOI:10.1145/3493286

Editor:
Jeff Nichols
Apple Inc., United States

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 October 2021

Published in PACMHCI Volume 5, Issue CSCW2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ARC Laureate Fellowship

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
191
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li LDuan MWang Y(2024)Tex2Py-45K: A Parallel Corpus Dataset for Bidirectional Conversion Between LaTeX and PythonProceedings of the 2024 8th International Conference on Computer Science and Artificial Intelligence10.1145/3709026.3709044(197-203)Online publication date: 6-Dec-2024
https://dl.acm.org/doi/10.1145/3709026.3709044
Xia KLi WGan SLu S(2024)TS2ACTProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314457:4(1-22)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631445
Orji EHaydar AErşan İMwambe O(2023)Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative ExplorationApplied Sciences10.3390/app13221250313:22(12503)Online publication date: 20-Nov-2023
https://doi.org/10.3390/app132212503
Xu LGu CTan RHe SChen JEskicioglu RHuang PPatwari N(2023)MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few LabelsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625782(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3625687.3625782
Ma SChen CKhalajzadeh HGrundy J(2023)A First Look at Dark Mode in Real-world Android AppsACM Transactions on Software Engineering and Methodology10.1145/360460733:1(1-26)Online publication date: 23-Nov-2023
https://dl.acm.org/doi/10.1145/3604607
Wu ZLi JMa KKambhamettu HHead A(2023)FFL: A Language and Live Runtime for Styling and Labeling Typeset Math FormulasProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606731(1-16)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3586183.3606731
Mansouri BJahedibashiz ZYoshioka MKiseleva JAliannejadi M(2023)Clarifying Questions in Math Information RetrievalProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605123(149-158)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605123
Miao SChen LHu RLuo Y(2022)Towards a Dynamic Inter-Sensor Correlations Learning Framework for Multi-Sensor-Based Wearable Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35503316:3(1-25)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3550331
Ahmed AHaresamudram HPloetz T(2022)Clustering of Human Activities from Wearables by Adopting Nearest NeighborsProceedings of the 2022 ACM International Symposium on Wearable Computers10.1145/3544794.3558477(1-5)Online publication date: 11-Sep-2022
https://dl.acm.org/doi/10.1145/3544794.3558477
Majethia RGhosh SNolasco HShahid FViswanath VShehu IZhao Y(2022)Stepping Into the Next Decade of Ubiquitous and Pervasive Computing: UbiComp and ISWC 2021IEEE Pervasive Computing10.1109/MPRV.2022.316006321:2(87-99)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1109/MPRV.2022.3160063

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents