ABSTRACT
With the ever-increasing volume of formulae on the Web, formula retrieval has drawn much attention from researchers. However, most of the existing researches on formula retrieval treat each formula within an article equally, while different formulae in the same article have different importance to the article. In this paper, we address the issue to rank formulae within an article based on their importance. To evaluate the importance of each formula within an article, a formula citation graph is firstly built in a large scale corpus. And the inter-articles features of formulae are extracted by the link topology analysis of formulae based on the graph. Then the word embedding model is explored to extract the inner-article features by mining the semantic relevance between a formula and the corresponding article. Finally, we leverage learning to rank technique for formulae ranking within an article based on those features. The experimental results demonstrate that the proposed features are helpful for formula ranking and our approach yields better performance compared with other state-of-the-art methods.
- Akiko Aizawa, Michael Kohlhase, Iadh Ounis, and Moritz Schubotz . 2014. NTCIR-11 Math-2 Task Overview. In NTCIR.Google Scholar
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender . 2005. Learning to rank using gradient descent. In ICML. ACM, 89--96. Google ScholarDigital Library
- Zhe Cao, Tao Qin, Tie Yan Liu, Ming Feng Tsai, and Hang Li . 2007. Learning to rank:from pairwise approach to listwise approach International Conference on Machine Learning. 129--136. Google ScholarDigital Library
- Kenny Davila and Richard Zanibbi . 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search International Conference on Research and Development in Information Retrieval. 1165--1168. Google ScholarDigital Library
- Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. TOIS (2002), 422--446. Google ScholarDigital Library
- Zhuoren Jiang, Xiaozhong Liu, and Yan Chen . 2016. Recovering Uncaptured Citations in a Scholarly Network: A Two-step Citation Analysis to Estimate Publication Importance. J. Assoc. Inf. Sci. Technol. Vol. 67 (2016), 1722--1735. Google ScholarDigital Library
- Thorsten Joachims . 2002. Optimizing search engines using clickthrough data. SIGKDD. ACM, 133--142. Google ScholarDigital Library
- Giovanni Yoko Kristianto, Goran Topić, and Akiko Aizawa . 2017. Utilizing dependency relationships between math expressions in math IR. Information Retrieval Journal Vol. 20 (2017), 132--167. Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013. Distributed Representations of Words and Phrases and their Compositionality. Advances in Neural Information Processing Systems Vol. 26 (2013), 3111--3119. Google ScholarDigital Library
- Bruce R Miller and Abdou Youssef . 2003. Technical aspects of the digital library of mathematical functions AAAI. Springer, 121--136. Google ScholarDigital Library
- Hao Peng, Jing Liu, and Chin Yew Lin . 2016. News Citation Recommendation with Implicit and Explicit Semantics ACL. 388--398.Google Scholar
- Gerard Salton and Michael J McGill . 1986. Introduction to modern information retrieval. (1986). Google ScholarDigital Library
- Bingjun Sun, Qingzhao Tan, Prasenjit Mitra, and C Lee Giles . 2007. Extraction and search of chemical formulae in text documents on the web WWW. ACM, 251--260. Google ScholarDigital Library
- Abhinav Thanda, Ankit Agarwal, Kushal Singla, Aditya Prakash, and Abhishek Gupta . 2016. A Document Retrieval System for Math Queries. In NTCIR.Google Scholar
- Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, and Ke Yuan . 2015. WikiMirs 3.0:A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document. In JCDL. 173--182. Google ScholarDigital Library
- Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, and Zhi Tang . 2016. A mathematical information retrieval system based on RankBoost JCDL. 259--260. Google ScholarDigital Library
- Richard Zanibbi, Akiko Aizawa, Michael Kohlhase, Iadh Ounis, Goran Topic, and Kenny Davila . 2016 a. NTCIR-12 MathIR Task Overview. In NTCIR.Google Scholar
- Richard Zanibbi, Kenny Davila, Andrew Kane, and Frank Wm Tompa . 2016 b. Multi-Stage Math Formula Search:Using Appearance-Based Similarity Metrics at Scale SIGIR. 145--154. Google ScholarDigital Library
Index Terms
- Formula Ranking within an Article
Recommendations
Full length article: Orthogonality and asymptotics of Pseudo-Jacobi polynomials for non-classical parameters
The family of general Jacobi polynomials P"n^(^@a^,^@b^) where @a,@b@?C can be characterised by complex (non-Hermitian) orthogonality relations (cf. Kuijlaars et al. (2005)). The special subclass of Jacobi polynomials P"n^(^@a^,^@b^) where @a,@b@?R are ...
Full length article: Exceptional Meixner and Laguerre orthogonal polynomials
Using Casorati determinants of Meixner polynomials (m"n^a^,^c)"n, we construct for each pair F=(F"1,F"2) of finite sets of positive integers a sequence of polynomials m"n^a^,^c^;^F, n@?@s"F, which are eigenfunctions of a second order difference operator,...
Full length article: Interlacing of zeros of orthogonal polynomials under modification of the measure
We investigate the mutual location of the zeros of two families of orthogonal polynomials. One of the families is orthogonal with respect to the measure d@m(x), supported on the interval (a,b) and the other with respect to the measure |x-c|^@t|x-d|^@cd@...
Comments