Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment

Lu, Yonghe; Luo, Jiayi; Xiao, Ying; Zhu, Hou

doi:10.1007/s11192-021-04028-4

Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment

Published: 23 June 2021

Volume 126, pages 6937–6963, (2021)
Cite this article

Scientometrics Aims and scope Submit manuscript

Yonghe Lu¹,
Jiayi Luo¹,
Ying Xiao¹ &
…
Hou Zhu¹

635 Accesses
2 Citations
Explore all metrics

Abstract

Text representation is the preliminary work for in-depth analysis and mining of information in scientific papers. It directly affects the effects of downstream tasks such as, scientific papers classification, clustering, and similarity calculation. However, recent researches mainly considered citation network and partial structural information, which is insufficient when representing scientific papers. Therefore, in order to improve the performance of text representation model, this paper proposed MV-HATrans, a text representation model that combines multi-viewpoint information, such as the semantic information of knowledge graph and structural information. This model extracts word information from three aspects, including contextual content, part of speech, and word meaning of WordNet. Based on combination of hierarchical attention mechanism and transformer, the model achieves the full text representation of scientific papers. Finally, this paper uses the binary experimental dataset AAPR, which indicates whether scientific papers are accepted or not, and applies the proposed model of text representation to achieve the goal of automatic quality assessment. Results show that in the quality classification of scientific papers, adopting part-of-speech information and semantic information based on WordNet definitions can effectively achieve the accuracy of prediction as 70.14%. Among all the structural modules, authors and abstracts contributes the most to the quality classification of scientific papers, especially authors as 9.51%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

TextConvoNet: a convolutional neural network based architecture for text classification

Article 22 October 2022

Graph neural networks in node classification: survey and evaluation

Article 02 November 2021

References

Achakulvisut, T., Acuna, D. E., Ruangrong, T., & Kording, K. (2016). Science concierge: A fast content-based recommendation system for scientific publications. PLoS One, 11(7), e0158423.
Article Google Scholar
Amami, M., Pasi, G., Stella, F., & Faiz, R. (2016). An LDA-based approach to scientific paper recommendation. In E. Metais, F. Meziane, M. Saraee, V. Sugumaran, & S. Vadera (Eds.), Natural language processing and information systems. Cham: Springer.
Google Scholar
Chen, G., & Xu, T. (2019). Sentence function recognition based on active learning. Data Analysis and Knowledge Discovery, 3(08), 53–61.
Chen, Y. (2008). Multi-class scientific literature automatic categorization system. Huazhong University of Science & Technology. Master thesis.
Dong, F., Zhang, Y., & Yang, J. (2017a). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st conference on computational natural language learning (CoNLI 2017), 153–162.
Dong, Y., Chawla, N. V., & Swami, A. (2017b). Metapath2vec scalable representation learning for heterogeneous networks. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, 135–144.
Article Google Scholar
Du, J. (2010). Scientific paper discrimination method research based-on word co-occurrence network and support vector machine. Harbin Institute of Technology. Master thesis.
Fassin, Y. (2018). A new qualitative rating system for scientific publications and a fame index for academics. Journal of the Association for Information Science and Technology, 69(11), 1396–1399.
Article Google Scholar
Ganguly, S., & Pudi, V. (2017). Paper2vec: Combining graph and text information for scientific paper representation. In European conference on information retrieval, 383–395.
Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(1), 1–16.
Article Google Scholar
Huang, Y., Lu, W., & Cheng, Q. (2016a). The structure recognition of academic text chapter content based recognition. Journal of the China Society for Scientific and Technical Information, 35(03), 293–300.
Huang, Y., Lu, W., Cheng, Q. et al. (2016b). The structure recognition of academic text paragraph-based recognition. Journal of the China Society for Scientific and Technical Information, 35(05), 530–538.
Jiang, L. L., Li, Y., Li, W. Q., & Xiong, Y. (2014). Representation model for conceptual design based on multi-viewpoint. Computer Integrated Manufacturing Systems, 5, 1.
Google Scholar
Kazemi, B., & Abhari, A. (2020). Content-based Node2Vec for representation of papers in the scientific literature. Data & Knowledge Engineering, 127, 101794.
Article Google Scholar
Kong, X., Mao, M., Wang, W., et al. (2018). VOPRec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing, 9, 226–237.
Article Google Scholar
Li, D., Tian, D., & Hu, X. (2015). Standard literature language model based on deep learning. Journal of Jilin University (Engineering and Technology Edition), 45(2), 596–599.
Li, J., & Wu, Y. (2015). Feature selection method of scientific literatures based on optimized K-medoids algorithm. Journal of Central China Normal University(Natural Sciences), 49(4), 541–545.
Li, L., Mao, L., Zhang, Y., et al. (2017). Computational linguistics literature and citations oriented citation linkage, classification and summarization. International Journal on Digital Libraries, 40, 173–190.
Google Scholar
Lu, W., Huang, Y., & Cheng, Q. (2014). The structure function of academic text and its classification. Journal of the China Society for Scientific and Technical Information, 33(09), 979–985.
Liu, K., Zhou, L., & Chen, X. (2012). A new clustering algorithm for scientific literature based on keywords. Library and Information Service, 56(4), 6.
Liu, M., Lang, B., Gu, Z., & Zeeshan, A. (2017). Measuring similarity of academic articles with semantic profile and joint word embedding. Tsinghua Science and Technology, 22(06), 619–632.
Luo, J., Wang, Q., & Li, Y. (2014). Word clustering based on word2vec and semantic similarity. In Proceedings of the 33rd Chinese Control Conference, 517-521. IEEE
Muller, M. C. (2017). Semantic author name disambiguation with word embeddings. International Conference on Theory and Practice of Digital Libraries, 2017, 300–311.
Google Scholar
Osman, Ahmed Hamza, & Barukub, Omar Mohammed. (2020). Graph-based text representation and matching: A review of the state of the art and future challenges. IEEE Access, 8, 87562–87583.
Article Google Scholar
Palangi, H., Deng, L., Shen, Y., et al. (2016). Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(4), 694–707.
Article Google Scholar
Peng, D., Yang, J., & Lu, J. (2020). Similar case matching with explicit knowledge-enhanced text representation. Applied Soft Computing, 95, 106514.
Peng, G., & Fen, W. Y. (2015). Topic mining in scientific literature based on LDA topic model and life cycle theory. Journal of the China Society for Scientific and Technical Information, 34(03), 286–299.
Google Scholar
Polavarapu, N., Navathe, S. B., & Ramnarayanan, R, et al. (2005). Investigation into biomedical literature classification using support vector machines. In 2005 IEEE Computational Systems Bioinformatics Conference, 366–374. IEEE.
Rachman, G. H., Khodra, M. L., & Widyantoro, D. H. (2017). Rhetorical sentence categorization for scientific paper using word2Vec semantic representation. Journal of Physics Conference Series, 801(1), 012070.
Article Google Scholar
Ramesh, K., Vasumurthy, C., & Venkatesh, D. (2014). High quality assessment of similarity by using multiple view points. International Journal of Emerging Technology in Computer Science and Electronics., 9(3), 72–74.
Google Scholar
Rios, A., & Kavuluru, R. (2015). Convolutional neural networks for biomedical text classification: Application in indexing biomedical articles. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, 2015, 258–267.
Article Google Scholar
Salimi, N. (2017). Quality assessment of scientific outputs using the BWM. Scientometrics, 112(1), 195–213.
Article MathSciNet Google Scholar
Setyawan, A., Ardiansyah, F. (2014). Automatic subject classification based on DDC system for library document. Skripsi Mahasiswa Ekstensi, 2(1).
Shen, A., Salehi, B., Baldwin, T., et al. (2019). A joint model for multimodal document quality assessment. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019, 107–110.
Article Google Scholar
Tang, Z., Li, W., Li, Y., et al. (2020). Several alternative term weighting methods for text representation and classification. Knowledge-Based Systems, 207, 106399.
Article Google Scholar
Tshitoyan, V., Dagdelen, J., Weston, L., et al. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95.
Article Google Scholar
Wang, D., Gao, R., Ye, W., et al. (2018). Research on the structure recognition of academic texts under different characteristics. Journal of the China Society for Scientific and Technical Information, 37(10), 31–42.
Wang, H., Ye, P., & Deng, S. (2014). The application of machine-learning in the research on automatic categorization of Chinese periodical articles. Data Analysis and Knowledge Discovery, 03, 80–87.
Wang, J., & He, W. (2009). Dissertation integrated assessment model to inform the fuzzy. Journal of Minzu University of China (Natural Sciences Edition), 18(01), 86–90.
Wang, J., Lu, W., Liu, J., et al. (2019). Research on structure function recognition of academic text based on multi-level fusion. Library and Information Service, 63(13), 95–104.
Wang, L., Yao, C., & Liu, Z. (2019). A scientific paper evaluation method based on text mining and bibliometrics. Information Science, 37(05), 66–70.
Wang, Q., Zeng, J., Liu, J., & Qi, J. (2020). Structure function recognition of academic text paragraph based on deep learning. Information Science, 38(03), 64–69.
Wang, R., Li, Z., & Cao, J, et al. (2019). Chinese text feature extraction and classification based on deep learning. In Proceedings of the 3rd international conference on computer science and application engineering, 1–5.
Wang, Y., Fu, Z., & Chen, B. (2016). Topic identification of scientific literature based on LDA topic model: Comparative analysis of two views of global and discipline. Information Studies: Theory & Application, 39(07), 121-126+101.
Wang, Z., Le, X., & He, Y. (2017). Recognizing core topic sentences with improved textrank algorithm based on WMD semantic similarity. Data Analysis and Knowledge Discovery, 1(04), 1–8.
Wen, Z., Hui, L., Hongjiao, X., et al. (2018). Application of deep learning technology in data analysis of scientific and technical literature. Information Studies: Theory & Application, 41(05), 110–113.
Google Scholar
Wu, L., Liang, X., & Song, H. (2020). A method of keywords association analysis of scientific papers based on super-network. Journal of the China Society for Scientific and Technical Information, 39(03), 253–258.
Xie, H., Feng, G., & He, W. (2018). Research on semantic classification of scientific and technical literature based on deep learning. Information Studies: Theory & Application, 41(11), 153–158.
Xiong, W., & Zhou, J. (2000). Great military rhetoric. Beijing: Great Wall Press.
Xu, H., Dong, M., Zhu, D., et al. (2016). Text classification with topic-based word embedding and convolutional neural networks. Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, 2016, 88–97.
Google Scholar
Yan, S. (2017). An evaluation on the quality of the engineering master theses based on the cloud-model. Journal of Xi’an University of Posts and Telecommunications, 22(05), 121–126.
Google Scholar
Yang, H., Gao, B., & Sun, H. (2016). Extracting topics of computer science literature with LDA model. Data Analysis and Knowledge Discovery, 11, 23–29.
Yang, P., Sun, X., & Li, W, et al. (2018). Automatic academic paper rating based on modularized hierarchical convolutional neural network. arXiv preprint: arXiv:1805.03977.
Yang, Z., Yang, D., & Dyer, C, et al. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489.
Yoon, S. H., Kim, S. W., Kim, J. S., et al. (2011). On computing text-based similarity in scientific literature. International Conference on World Wide Web, 2011, 169–170.
Google Scholar
Zhao, Q., Geng, Q., Jin, J., et al. (2017). A topical coverage and authority unification model for expert recommendation. Library and Information Service, 1, 80–88.
Zhang, Z., Yang, H., Bu, J., et al. (2018). ANRL: Attributed network representation learning via deep neural networks. IJCAI, 18, 3155–3161.
Google Scholar
Zhang, Z., Chu, Y., & Wu, X. (2019). Multi-source literature topics based on LDA and their differences taking “machine learning as an example. Information Science, 037(006), 108–112.
Google Scholar
Zhao, S., Zhang, D., Duan, Z., et al. (2018). A novel classification method for paper-reviewer recommendation. Scientometrics, 115(3), 1293–1313.
Article Google Scholar
Zhao, F., Zhang, Y., Lu, J., et al. (2019). Measuring academic influence using heterogeneous author-citation networks. Scientometrics, 118, 1119–1140.
Article Google Scholar
Zheng, J., Cai, F., Chen, H., et al. (2020). Pre-train, interact, fine-tune: A novel interaction representation for text classification. Information Processing & Management, 57, 102215.
Zhu, D., Dai, X. Y., & Chen, J. (2019). Representing anything from scholar papers. Journal of Web Semantics, 59, 100498.
Article Google Scholar
Zhu, L., Du, X., & Li, H. (2018). Study on the construction of index system for automatic review of academic paper quality under the perspective of knowledge production. Library and Information Service, 62(24), 79–86.

Download references

Acknowledgements

The authors warmly thank reviewers for their valuable suggestions. This research was partly supported by Basic and Applied Basic Research Fund of Guangdong Province (No. 2019B1515120085), National Natural Science Foundation of China [Grant Number: 71373291], and Science and Technology Planning Project of Guangdong Province (China) [Grant Number: 2016B030303003].Jiayi Luo and Ying Xiao are the co second authors.

Author information

Authors and Affiliations

School of Information Management, Sun Yat-sen University, Guangzhou, China
Yonghe Lu, Jiayi Luo, Ying Xiao & Hou Zhu

Authors

Yonghe Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiayi Luo
View author publications
You can also search for this author in PubMed Google Scholar
Ying Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Hou Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hou Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, Y., Luo, J., Xiao, Y. et al. Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment. Scientometrics 126, 6937–6963 (2021). https://doi.org/10.1007/s11192-021-04028-4

Download citation

Received: 27 November 2020
Accepted: 03 May 2021
Published: 23 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11192-021-04028-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Graph neural networks in node classification: survey and evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Graph neural networks in node classification: survey and evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation