Overview of ARQMath-2 (2021): Second CLEF Lab on Answer Retrieval for Questions on Math

Mansouri, Behrooz; Zanibbi, Richard; Oard, Douglas W.; Agarwal, Anurag

doi:10.1007/978-3-030-85251-1_17

Overview of ARQMath-2 (2021): Second CLEF Lab on Answer Retrieval for Questions on Math

Behrooz Mansouri¹⁸,
Richard Zanibbi¹⁸,
Douglas W. Oard¹⁹ &
…
Anurag Agarwal¹⁸

Conference paper
First Online: 14 September 2021

946 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Abstract

This paper provides an overview of the second year of the Answer Retrieval for Questions on Math (ARQMath-2) lab, run as part of CLEF 2021. The goal of ARQMath is to advance techniques for mathematical information retrieval, in particular retrieving answers to mathematical questions (Task 1), and formula retrieval (Task 2). Eleven groups participated in ARQMath-2, submitting 36 runs for Task 1 and 17 runs for Task 2. The results suggest that some combination of experience with the task design and the training data available from ARQMath-1 was beneficial, with greater improvements in ARQMath-2 relative to baselines for both Task 1 and Task 2 than for ARQMath-1 relative to those same baselines. Tasks, topics, evaluation protocols, and results for each task are presented in this lab overview.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://www.cs.rit.edu/~dprl/ARQMath.
2.
https://math.stackexchange.com.
3.
https://www.w3.org/Math.
4.
https://dlmf.nist.gov.
5.
https://archive.org/download/stackexchange.
6.
https://dlmf.nist.gov/LaTeXML.
7.
We thank Deyan Ginev and Vit Novotny for helping reduce LaTeXML failures: for ARQMath-1 conversion failures affected 8% of SLTs, and 10% of OPTs.
8.
We thank Frank Tompa for sharing this suggestion at CLEF 2020.
9.
https://drive.google.com/drive/folders/1ZPKIWDnhMGRaPNVLi1reQxZWTfH2R4u3.
10.
https://github.com/ARQMath/ARQMathCode.
11.
Participating systems did not have access to this information.
12.
In ARQMath-1, all topics had links to at least one duplicate or related post that were available to the organizers.
13.
https://github.com/hltcoe/turkle.
14.
H+M binarization corresponds to the definition of relevance usually used in the Text Retrieval Conference (TREC). The TREC definition is “If you were writing a report on the subject of the topic and would use the information contained in the document in the report, then the document is relevant. Only binary judgments (‘relevant’ or ‘not relevant’) are made, and a document is judged relevant if any piece of it is relevant (regardless of how small the piece is in relation to the rest of the document).” (source: https://trec.nist.gov/data/reljudge_eng.html).
15.
One assessor (with id 7) was not able to continue assessment.
16.
Two of the 4 dual-assessed topics had no high or medium relevant answers found by either assessor.
17.
Pooling to at least depth 10 ensures that there are no unjudged posts above rank 10 for any baseline, primary, or alternative run. Note that P\(^\prime \)@10 cannot achieve a value of 1 because some topics have fewer than 10 relevant posts.
18.
https://github.com/usnistgov/trec_eval.
19.
This differs from the approach used for ARQMath-1, when only submitted formula instances were clustered. For ARQMath-2 the full formula collection was clustered to facilitate post hoc use of the resulting test collection.
20.
In ARQMath-2, Task 1 pools were not used to seed task 2 pools.
21.
For ARQMath-1, 92% of formula instances had an SLT representation; for ARQMath-2 we reparsed the collection and improved this to 99.9%.
22.
As mentioned in Sect. 3, a relatively small number of formulae per topic had incorrectly generated visual ids. In 6 cases assessors indicated that a pooled formula for a single visual id was ‘not matching’ the other formulae in hits grouped for a visual id, rather than assign a relevance score for the formula.

References

Aizawa, A., Kohlhase, M., Ounis, I.: NTCIR-10 math pilot task overview. In: NTCIR (2013)
Google Scholar
Aizawa, A., Kohlhase, M., Ounis, I., Schubotz, M.: NTCIR-11 Math-2 task overview. NTCIR 11, 88–98 (2014)
Google Scholar
Borlund, P.: The IIR evaluation model: a framework for evaluation of interactive information retrieval systems. Inf. Res. 8(3) (2003)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32 (2004)
Google Scholar
Davila, K., Zanibbi, R.: Layout and semantics: combining representations for mathematical formula search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1165–1168 (2017)
Google Scholar
Guidi, F., Sacerdoti Coen, C.: A survey on retrieval of mathematical knowledge. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge, V. (eds.) CICM 2015. LNCS (LNAI), vol. 9150, pp. 296–315. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20615-8_20
Chapter Google Scholar
Hopkins, M., Le Bras, R., Petrescu-Prahova, C., Stanovsky, G., Hajishirzi, H., Koncel-Kedziorski, R.: SemEval-2019 task 10: math question answering. In: Proceedings of the 13th International Workshop on Semantic Evaluation (2019)
Google Scholar
Kaliszyk, C., Brady, E., Kohlhase, A., Sacerdoti Coen, C. (eds.): CICM 2019. LNCS (LNAI), vol. 11617. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23250-4
Book Google Scholar
Mansouri, B., Agarwal, A., Oard, D., Zanibbi, R.: Finding old answers to new math questions: the ARQMath Lab at CLEF 2020. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 564–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_73
Chapter Google Scholar
Mansouri, B., Oard, D.W., Zanibbi, R.: DPRL systems in the CLEF 2020 ARQMath lab. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020. CEUR Workshop Proceedings, vol. 2696. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2696/paper_223.pdf
Mansouri, B., Rohatgi, S., Oard, D.W., Wu, J., Giles, C.L., Zanibbi, R.: Tangent-CFT: an embedding model for mathematical formulas. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR), pp. 11–18 (2019)
Google Scholar
Nishizawa, G., Liu, J., Diaz, Y., Dmello, A., Zhong, W., Zanibbi, R.: MathSeer: a math-aware search interface with intuitive formula editing, reuse, and lookup. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 470–475. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_60
Chapter Google Scholar
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_37
Chapter Google Scholar
Sakai, T., Kando, N.: On information retrieval metrics designed for evaluation with incomplete relevance assessments. Inf. Retrieval 11(5), 447–470 (2008)
Article Google Scholar
Schubotz, M., Youssef, A., Markl, V., Cohl, H.S.: Challenges of mathematical information retrieval in the NTCIR-11 math Wikipedia task. In: SIGIR, pp. 951–954. ACM (2015)
Google Scholar
Zanibbi, R., Aizawa, A., Kohlhase, M., Ounis, I., Topic, G., Davila, K.: NTCIR-12 MathIR task overview. In: NTCIR (2016)
Google Scholar
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. (IJDAR) 15(4), 331–357 (2012)
Article Google Scholar
Zanibbi, R., Oard, D.W., Agarwal, A., Mansouri, B.: Overview of ARQMath 2020: CLEF lab on answer retrieval for questions on math. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 169–193. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_15
Chapter Google Scholar
Zhong, W., Zanibbi, R.: Structural similarity search for formulas using leaf-root paths in operator subtrees. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11437, pp. 116–129. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15712-8_8
Chapter Google Scholar

Download references

Acknowledgements

We thank our student assessors from RIT and St. John Fisher College: Josh Anglum, Dominick Banasick, Aubrey Marcsisin, Nathalie Petruzelli, Siegfried Porterfield, Chase Shuster, and Freddy Stock. This material is based upon work supported by the National Science Foundation (USA) under Grant No. IIS-1717997 and the Alfred P. Sloan Foundation under Grant No. G-2017-9827.

Author information

Authors and Affiliations

Rochester Institute of Technology, Rochester, USA
Behrooz Mansouri, Richard Zanibbi & Anurag Agarwal
University of Maryland, College Park, USA
Douglas W. Oard

Authors

Behrooz Mansouri
View author publications
You can also search for this author in PubMed Google Scholar
Richard Zanibbi
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Oard
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrooz Mansouri .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mansouri, B., Zanibbi, R., Oard, D.W., Agarwal, A. (2021). Overview of ARQMath-2 (2021): Second CLEF Lab on Answer Retrieval for Questions on Math. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_17
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics