Skip to main content
Log in

A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

It is common sense that some subjects have strong relationships while others are perhaps almost mutually independent, but a quantitative and systematic approach to describe such sense is a deficiency. A technique called pointwise mutual information (PMI) from information science helps to fulfill the request, but the calculation through a large-scale database is computationally infeasible if one requires an instantaneous value. This work provides a two-step remedy via deep learning for estimating and predicting relationships among two subject types that are found in the large-scale citation database called the Web of Science. The resulting model successfully replicates existing PMI values among subject types, and it can be used for predicting PMI values of two subject types if one or both subject types does not exist in the database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46, 175–185.

    MathSciNet  Google Scholar 

  • Ascoli, G. A. (Ed.). (2002). Computational neuroanatomy: Principles and methods, Totowa. New Jersey: Humana Press.

    Google Scholar 

  • Bakshi, U. A., & Bakshi, A. V. (2008). Electrical networks. Pume: Technical Publications.

    Google Scholar 

  • Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Belmont: Wadsworth.h.

    MATH  Google Scholar 

  • Church, K. W., & Hanks, P. (1990). Word association norms, mutual information and lexicography. Computational Linguistics, 16, 22–29.

    Google Scholar 

  • Chang, L. L. N., Phoa, F. K. H., & Nakano, J. (2019). A new metrics for the analysis of the scientific article citation network. IEEE Access, 7, 132027–132032.

    Article  Google Scholar 

  • Dixon, S. J., & Brereton, R. G. (2009). Comparison of performance of five common classifiers represented as boundary methods: Euclidean distance to centroids, linear discriminant analysis, quadratic discriminant analysis, learning vector quantization and support vector machines, as dependent on data structure. Chemometrics and Intelligent Laboratory Systems, 95, 1–17.

    Article  Google Scholar 

  • Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7, 1–199.

    Article  MathSciNet  Google Scholar 

  • Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaected by shift in position. Biological cybernetics, 36, 193–202.

    Article  Google Scholar 

  • Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. London: Psychology Press.

    Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.

    Article  MathSciNet  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.

    Article  Google Scholar 

  • He, K., Zhang, X., Ren, S. & Sun, J. (2015). Deep residual learning for image recognition. Preprint arXiv:1512.03385.

  • Ivakhnenko, A. G., Lapa, V. G., & McDonough, R. N. (1967). Cybernetics and forecasting techniques. New York: American Elsevier.

    Google Scholar 

  • Kononenko, I. (1993). Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence, 7, 331–337.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.

    Google Scholar 

  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541–551.

    Article  Google Scholar 

  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2, 396–404.

    Google Scholar 

  • McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5, 115–133.

    Article  MathSciNet  Google Scholar 

  • Phoa, F. K. H., & Sanchez, J. (2013). Modeling the browsing behaviour of world wide web users. Open Journal of Statistics, 3, 145–154.

    Article  Google Scholar 

  • Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.

    Article  Google Scholar 

  • Roder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic conherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM, 15, 399–408.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

    Article  Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).

  • Simonyan, K. & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Preprint arXiv:1409.1556.

  • Wang, T. C., & Phoa, F. K. H. (2016). A scanning method for detecting clustering pattern of both attribute and structure in social networks. Physica A, 445, 295–309.

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Clarivate Analytics to provide the access to the raw data of the Web of Science database for research investigations. They also thank the URA team of ISM for transforming the data into neo4j database and providing the neo4j database for analysis in this work. In addition, they would like to thank Ms. Ula Tzu-Ning Kung to provide English editing service in this paper, and Ms. Ashwini Balaji Barve to provide some background information on deep learning. This work was supported by Academia Sinica Grant Number AS-TP-109-M07 and the Ministry of Science and Technology (Taiwan) Grant Numbers 107-2118-M-001-011-MY3, 107-2321-B-001-038 and 108-2321-B-001-016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frederick Kin Hing Phoa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phoa, F.K.H., Lai, HY., Chang, L.LH. et al. A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science. Scientometrics 125, 851–863 (2020). https://doi.org/10.1007/s11192-020-03599-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03599-y

Keywords

Navigation