Skip to main content

M2R: From Mathematical Models to Resource Description Framework

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2022)

Abstract

Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pdfbox.apache.org/.

  2. 2.

    https://protege.stanford.edu/.

  3. 3.

    https://www.elastic.co/cn/elasticsearch/.

  4. 4.

    https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page.

  5. 5.

    https://cn.bing.com/.

  6. 6.

    https://www.baidu.com/.

References

  1. Adel, H., Schütze, H.: Global normalization of convolutional neural networks for joint entity and relation classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1723–1729 (2017)

    Google Scholar 

  2. Al-Khatib, K., Hou, Y., Wachsmuth, H., Jochim, C., Bonin, F., Stein, B.: End-to-end argumentation knowledge graph construction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7367–7374 (2020)

    Google Scholar 

  3. Amit, S.: Introducing the knowledge graph: Things, not strings. Official Google Blog (2012)

    Google Scholar 

  4. Berners-Lee, T., Handler, J., Lassila, O.: The semantic web. Sci. Am. 284(5), 34–43 (2003)

    Article  Google Scholar 

  5. Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: Comet: commonsense transformers for automatic knowledge graph construction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4762–4779 (2019)

    Google Scholar 

  6. Buscaldi, D., Dessì, D., Motta, E., Osborne, F., Reforgiato Recupero, D.: Mining scholarly publications for scientific knowledge graph construction. In: European Semantic Web Conference, pp. 8–12 (2019)

    Google Scholar 

  7. Carette, J., Farmer, W.M.: A review of mathematical knowledge management. In: International Conference on Intelligent Computer Mathematics, pp. 233–246 (2009)

    Google Scholar 

  8. Chen, P., Lu, Y., Zheng, V.W., Chen, X., Yang, B.: Knowedu: A system to construct knowledge graph for education. IEEE Access 6, 31553–31563 (2018)

    Article  Google Scholar 

  9. Elhammadi, S., et al.: A high precision pipeline for financial knowledge graph construction. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 967–977 (2020)

    Google Scholar 

  10. Elizarov, A., Kirillovich, A., Lipachev, E., Nevzorova, O.: Digital ecosystem ontomath: Mathematical knowledge analytics and management. In: International Conference on Data Analytics and Management in Data Intensive Domains, pp. 33–46 (2016)

    Google Scholar 

  11. Farmer, W.M.: MKM: A new interdisciplinary field of research. ACM SIGSAM Bullet. 38(2), 47–52 (2004)

    Article  MATH  Google Scholar 

  12. Gao, L., Yi, X., Liao, Y., Jiang, Z., Yan, Z., Tang, Z.: A deep learning-based formula detection method for pdf documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 553–558 (2017). https://doi.org/10.1109/ICDAR.2017.96

  13. Hai Phong, B., Manh Hoang, T., Le, T.L., Aizawa, A.: Mathematical variable detection in pdf scientific documents. In: Intelligent Information and Database Systems, pp. 694–706 (2019)

    Google Scholar 

  14. Kacem, A., Belaïd, A., Ben Ahmed, M.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Docum. Anal. Recogn. 4(2), 97–108 (2001)

    Article  Google Scholar 

  15. Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., Li, J.: A unified MRC framework for named entity recognition. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5849–5859 (2020)

    Google Scholar 

  16. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3219–3232 (2018)

    Google Scholar 

  17. Martinez-Rodriguez, J.L., López-Arévalo, I., Rios-Alvarado, A.B.: Openie-based approach for knowledge graph construction from text. Exp. Syst. Appl. 113, 339–355 (2018)

    Article  Google Scholar 

  18. Ren, F., et al.: Techkg: A large-scale Chinese technology-oriented knowledge graph. arXiv preprint arXiv:1812.06722 (2018)

  19. Saha, A., Pahuja, V., Khapra, M., Sankaranarayanan, K., Chandar, S.: Complex sequential question answering: Towards learning to converse over linked question answer pairs with a knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  20. Song, W., Duan, Z., Yang, Z., Zhu, H., Zhang, M., Tang, J.: Explainable knowledge graph-based recommendation via deep reinforcement learning. arXiv preprint arXiv:1906.09506 (2019)

  21. Tosi, M.D.L., dos Reis, J.C.: Scikgraph: A knowledge graph approach to structure a scientific field. J. Inf. 15(1), 101109 (2021)

    Google Scholar 

  22. Wang, H., et al.: Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 417–426 (2018)

    Google Scholar 

  23. Wang, T., Li, H.: Coreference resolution improves educational knowledge graph construction. In: 2020 IEEE International Conference on Knowledge Graph (ICKG), pp. 629–634 (2020)

    Google Scholar 

  24. Wang, Y., Yu, B., Zhang, Y., Liu, T., Zhu, H., Sun, L.: Tplinker: Single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1572–1582 (2020)

    Google Scholar 

  25. Wei, Z., Su, J., Wang, Y., Tian, Y., Chang, Y.: A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1476–1488 (2020)

    Google Scholar 

  26. Yu, B., Tian, X., Luo, W.: Extracting mathematical components directly from pdf documents for mathematical expression recognition and retrieval. In: Advances in Swarm Intelligence, pp. 170–179 (2014)

    Google Scholar 

  27. Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Docum. Anal. Recogn. (IJDAR) 15(4), 331–357 (2012)

    Article  Google Scholar 

Download references

Acknowledgments.

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61602149, and in part by the Fundamental Research Funds for the Central Universities, China under Grant No. B210202078.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zou, C., Li, X., Wu, P., Xie, H. (2023). M2R: From Mathematical Models to Resource Description Framework. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13422. Springer, Cham. https://doi.org/10.1007/978-3-031-25198-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25198-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25197-9

  • Online ISBN: 978-3-031-25198-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics