Skip to main content
Log in

Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Text representation is the preliminary work for in-depth analysis and mining of information in scientific papers. It directly affects the effects of downstream tasks such as, scientific papers classification, clustering, and similarity calculation. However, recent researches mainly considered citation network and partial structural information, which is insufficient when representing scientific papers. Therefore, in order to improve the performance of text representation model, this paper proposed MV-HATrans, a text representation model that combines multi-viewpoint information, such as the semantic information of knowledge graph and structural information. This model extracts word information from three aspects, including contextual content, part of speech, and word meaning of WordNet. Based on combination of hierarchical attention mechanism and transformer, the model achieves the full text representation of scientific papers. Finally, this paper uses the binary experimental dataset AAPR, which indicates whether scientific papers are accepted or not, and applies the proposed model of text representation to achieve the goal of automatic quality assessment. Results show that in the quality classification of scientific papers, adopting part-of-speech information and semantic information based on WordNet definitions can effectively achieve the accuracy of prediction as 70.14%. Among all the structural modules, authors and abstracts contributes the most to the quality classification of scientific papers, especially authors as 9.51%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Achakulvisut, T., Acuna, D. E., Ruangrong, T., & Kording, K. (2016). Science concierge: A fast content-based recommendation system for scientific publications. PLoS One, 11(7), e0158423.

    Article  Google Scholar 

  • Amami, M., Pasi, G., Stella, F., & Faiz, R. (2016). An LDA-based approach to scientific paper recommendation. In E. Metais, F. Meziane, M. Saraee, V. Sugumaran, & S. Vadera (Eds.), Natural language processing and information systems. Cham: Springer.

    Google Scholar 

  • Chen, G., & Xu, T. (2019). Sentence function recognition based on active learning. Data Analysis and Knowledge Discovery, 3(08), 53–61.

  • Chen, Y. (2008). Multi-class scientific literature automatic categorization system. Huazhong University of Science & Technology. Master thesis.

  • Dong, F., Zhang, Y., & Yang, J. (2017a). Attention-based recurrent convolutional neural network for automatic essay scoring. In Proceedings of the 21st conference on computational natural language learning (CoNLI 2017), 153–162.

  • Dong, Y., Chawla, N. V., & Swami, A. (2017b). Metapath2vec scalable representation learning for heterogeneous networks. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, 135–144.

    Article  Google Scholar 

  • Du, J. (2010). Scientific paper discrimination method research based-on word co-occurrence network and support vector machine. Harbin Institute of Technology. Master thesis.

  • Fassin, Y. (2018). A new qualitative rating system for scientific publications and a fame index for academics. Journal of the Association for Information Science and Technology, 69(11), 1396–1399.

    Article  Google Scholar 

  • Ganguly, S., & Pudi, V. (2017). Paper2vec: Combining graph and text information for scientific paper representation. In European conference on information retrieval, 383–395.

  • Heffernan, K., & Teufel, S. (2018). Identifying problems and solutions in scientific text. Scientometrics, 116(1), 1–16.

    Article  Google Scholar 

  • Huang, Y., Lu, W., & Cheng, Q. (2016a). The structure recognition of academic text chapter content based recognition. Journal of the China Society for Scientific and Technical Information, 35(03), 293–300.

  • Huang, Y., Lu, W., Cheng, Q. et al. (2016b). The structure recognition of academic text paragraph-based recognition. Journal of the China Society for Scientific and Technical Information, 35(05), 530–538.

  • Jiang, L. L., Li, Y., Li, W. Q., & Xiong, Y. (2014). Representation model for conceptual design based on multi-viewpoint. Computer Integrated Manufacturing Systems, 5, 1.

    Google Scholar 

  • Kazemi, B., & Abhari, A. (2020). Content-based Node2Vec for representation of papers in the scientific literature. Data & Knowledge Engineering, 127, 101794.

    Article  Google Scholar 

  • Kong, X., Mao, M., Wang, W., et al. (2018). VOPRec: Vector representation learning of papers with text information and structural identity for recommendation. IEEE Transactions on Emerging Topics in Computing, 9, 226–237.

    Article  Google Scholar 

  • Li, D., Tian, D., & Hu, X. (2015). Standard literature language model based on deep learning. Journal of Jilin University (Engineering and Technology Edition), 45(2), 596–599.

  • Li, J., & Wu, Y. (2015). Feature selection method of scientific literatures based on optimized K-medoids algorithm. Journal of Central China Normal University(Natural Sciences), 49(4), 541–545.

  • Li, L., Mao, L., Zhang, Y., et al. (2017). Computational linguistics literature and citations oriented citation linkage, classification and summarization. International Journal on Digital Libraries, 40, 173–190.

    Google Scholar 

  • Lu, W., Huang, Y., & Cheng, Q. (2014). The structure function of academic text and its classification. Journal of the China Society for Scientific and Technical Information, 33(09), 979–985.

  • Liu, K., Zhou, L., & Chen, X. (2012). A new clustering algorithm for scientific literature based on keywords. Library and Information Service, 56(4), 6.

  • Liu, M., Lang, B., Gu, Z., & Zeeshan, A. (2017). Measuring similarity of academic articles with semantic profile and joint word embedding. Tsinghua Science and Technology, 22(06), 619–632.

  • Luo, J., Wang, Q., & Li, Y. (2014). Word clustering based on word2vec and semantic similarity. In Proceedings of the 33rd Chinese Control Conference, 517-521. IEEE

  • Muller, M. C. (2017). Semantic author name disambiguation with word embeddings. International Conference on Theory and Practice of Digital Libraries, 2017, 300–311.

    Google Scholar 

  • Osman, Ahmed Hamza, & Barukub, Omar Mohammed. (2020). Graph-based text representation and matching: A review of the state of the art and future challenges. IEEE Access, 8, 87562–87583.

    Article  Google Scholar 

  • Palangi, H., Deng, L., Shen, Y., et al. (2016). Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(4), 694–707.

    Article  Google Scholar 

  • Peng, D., Yang, J., & Lu, J. (2020). Similar case matching with explicit knowledge-enhanced text representation. Applied Soft Computing, 95, 106514.

  • Peng, G., & Fen, W. Y. (2015). Topic mining in scientific literature based on LDA topic model and life cycle theory. Journal of the China Society for Scientific and Technical Information, 34(03), 286–299.

    Google Scholar 

  • Polavarapu, N., Navathe, S. B., & Ramnarayanan, R, et al. (2005). Investigation into biomedical literature classification using support vector machines. In 2005 IEEE Computational Systems Bioinformatics Conference, 366–374. IEEE.

  • Rachman, G. H., Khodra, M. L., & Widyantoro, D. H. (2017). Rhetorical sentence categorization for scientific paper using word2Vec semantic representation. Journal of Physics Conference Series, 801(1), 012070.

    Article  Google Scholar 

  • Ramesh, K., Vasumurthy, C., & Venkatesh, D. (2014). High quality assessment of similarity by using multiple view points. International Journal of Emerging Technology in Computer Science and Electronics., 9(3), 72–74.

    Google Scholar 

  • Rios, A., & Kavuluru, R. (2015). Convolutional neural networks for biomedical text classification: Application in indexing biomedical articles. Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, 2015, 258–267.

    Article  Google Scholar 

  • Salimi, N. (2017). Quality assessment of scientific outputs using the BWM. Scientometrics, 112(1), 195–213.

    Article  MathSciNet  Google Scholar 

  • Setyawan, A., Ardiansyah, F. (2014). Automatic subject classification based on DDC system for library document. Skripsi Mahasiswa Ekstensi, 2(1).

  • Shen, A., Salehi, B., Baldwin, T., et al. (2019). A joint model for multimodal document quality assessment. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019, 107–110.

    Article  Google Scholar 

  • Tang, Z., Li, W., Li, Y., et al. (2020). Several alternative term weighting methods for text representation and classification. Knowledge-Based Systems, 207, 106399.

    Article  Google Scholar 

  • Tshitoyan, V., Dagdelen, J., Weston, L., et al. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95.

    Article  Google Scholar 

  • Wang, D., Gao, R., Ye, W., et al. (2018). Research on the structure recognition of academic texts under different characteristics. Journal of the China Society for Scientific and Technical Information, 37(10), 31–42.

  • Wang, H., Ye, P., & Deng, S. (2014). The application of machine-learning in the research on automatic categorization of Chinese periodical articles. Data Analysis and Knowledge Discovery, 03, 80–87.

  • Wang, J., & He, W. (2009). Dissertation integrated assessment model to inform the fuzzy. Journal of Minzu University of China (Natural Sciences Edition), 18(01), 86–90.

  • Wang, J., Lu, W., Liu, J., et al. (2019). Research on structure function recognition of academic text based on multi-level fusion. Library and Information Service, 63(13), 95–104.

  • Wang, L., Yao, C., & Liu, Z. (2019). A scientific paper evaluation method based on text mining and bibliometrics. Information Science, 37(05), 66–70.

  • Wang, Q., Zeng, J., Liu, J., & Qi, J. (2020). Structure function recognition of academic text paragraph based on deep learning. Information Science, 38(03), 64–69.

  • Wang, R., Li, Z., & Cao, J, et al. (2019). Chinese text feature extraction and classification based on deep learning. In Proceedings of the 3rd international conference on computer science and application engineering, 1–5.

  • Wang, Y., Fu, Z., & Chen, B. (2016). Topic identification of scientific literature based on LDA topic model: Comparative analysis of two views of global and discipline. Information Studies: Theory & Application, 39(07), 121-126+101.

  • Wang, Z., Le, X., & He, Y. (2017). Recognizing core topic sentences with improved textrank algorithm based on WMD semantic similarity. Data Analysis and Knowledge Discovery, 1(04), 1–8.

  • Wen, Z., Hui, L., Hongjiao, X., et al. (2018). Application of deep learning technology in data analysis of scientific and technical literature. Information Studies: Theory & Application, 41(05), 110–113.

    Google Scholar 

  • Wu, L., Liang, X., & Song, H. (2020). A method of keywords association analysis of scientific papers based on super-network. Journal of the China Society for Scientific and Technical Information, 39(03), 253–258.

  • Xie, H., Feng, G., & He, W. (2018). Research on semantic classification of scientific and technical literature based on deep learning. Information Studies: Theory & Application, 41(11), 153–158.

  • Xiong, W., & Zhou, J. (2000). Great military rhetoric. Beijing: Great Wall Press.

  • Xu, H., Dong, M., Zhu, D., et al. (2016). Text classification with topic-based word embedding and convolutional neural networks. Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, 2016, 88–97.

    Google Scholar 

  • Yan, S. (2017). An evaluation on the quality of the engineering master theses based on the cloud-model. Journal of Xi’an University of Posts and Telecommunications, 22(05), 121–126.

    Google Scholar 

  • Yang, H., Gao, B., & Sun, H. (2016). Extracting topics of computer science literature with LDA model. Data Analysis and Knowledge Discovery, 11, 23–29.

  • Yang, P., Sun, X., & Li, W, et al. (2018). Automatic academic paper rating based on modularized hierarchical convolutional neural network. arXiv preprint: arXiv:1805.03977.

  • Yang, Z., Yang, D., & Dyer, C, et al. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, 1480–1489.

  • Yoon, S. H., Kim, S. W., Kim, J. S., et al. (2011). On computing text-based similarity in scientific literature. International Conference on World Wide Web, 2011, 169–170.

    Google Scholar 

  • Zhao, Q., Geng, Q., Jin, J., et al. (2017). A topical coverage and authority unification model for expert recommendation. Library and Information Service, 1, 80–88.

  • Zhang, Z., Yang, H., Bu, J., et al. (2018). ANRL: Attributed network representation learning via deep neural networks. IJCAI, 18, 3155–3161.

    Google Scholar 

  • Zhang, Z., Chu, Y., & Wu, X. (2019). Multi-source literature topics based on LDA and their differences taking “machine learning as an example. Information Science, 037(006), 108–112.

    Google Scholar 

  • Zhao, S., Zhang, D., Duan, Z., et al. (2018). A novel classification method for paper-reviewer recommendation. Scientometrics, 115(3), 1293–1313.

    Article  Google Scholar 

  • Zhao, F., Zhang, Y., Lu, J., et al. (2019). Measuring academic influence using heterogeneous author-citation networks. Scientometrics, 118, 1119–1140.

    Article  Google Scholar 

  • Zheng, J., Cai, F., Chen, H., et al. (2020). Pre-train, interact, fine-tune: A novel interaction representation for text classification. Information Processing & Management, 57, 102215.

  • Zhu, D., Dai, X. Y., & Chen, J. (2019). Representing anything from scholar papers. Journal of Web Semantics, 59, 100498.

    Article  Google Scholar 

  • Zhu, L., Du, X., & Li, H. (2018). Study on the construction of index system for automatic review of academic paper quality under the perspective of knowledge production. Library and Information Service, 62(24), 79–86.

Download references

Acknowledgements

The authors warmly thank reviewers for their valuable suggestions. This research was partly supported by Basic and Applied Basic Research Fund of Guangdong Province (No. 2019B1515120085), National Natural Science Foundation of China [Grant Number: 71373291], and Science and Technology Planning Project of Guangdong Province (China) [Grant Number: 2016B030303003].Jiayi Luo and Ying Xiao are the co second authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hou Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Y., Luo, J., Xiao, Y. et al. Text representation model of scientific papers based on fusing multi-viewpoint information and its quality assessment. Scientometrics 126, 6937–6963 (2021). https://doi.org/10.1007/s11192-021-04028-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-021-04028-4

Keywords

Navigation