A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science

Phoa, Frederick Kin Hing; Lai, Hsin-Yi; Chang, Livia Lin-Hsuan; Honda, Keisuke

doi:10.1007/s11192-020-03599-y

A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science

Published: 17 August 2020

Volume 125, pages 851–863, (2020)
Cite this article

Scientometrics Aims and scope Submit manuscript

Frederick Kin Hing Phoa¹,
Hsin-Yi Lai²,
Livia Lin-Hsuan Chang³ &
…
Keisuke Honda⁴

411 Accesses
2 Citations
Explore all metrics

Abstract

It is common sense that some subjects have strong relationships while others are perhaps almost mutually independent, but a quantitative and systematic approach to describe such sense is a deficiency. A technique called pointwise mutual information (PMI) from information science helps to fulfill the request, but the calculation through a large-scale database is computationally infeasible if one requires an instantaneous value. This work provides a two-step remedy via deep learning for estimating and predicting relationships among two subject types that are found in the large-scale citation database called the Web of Science. The resulting model successfully replicates existing PMI values among subject types, and it can be used for predicting PMI values of two subject types if one or both subject types does not exist in the database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and deep learning

Article Open access 08 April 2021

Christian Janiesch, Patrick Zschech & Kai Heinrich

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

Clara Busse & Ella August

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

Sascha Kraus, Matthias Breier, … João J. Ferreira

References

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46, 175–185.
MathSciNet Google Scholar
Ascoli, G. A. (Ed.). (2002). Computational neuroanatomy: Principles and methods, Totowa. New Jersey: Humana Press.
Google Scholar
Bakshi, U. A., & Bakshi, A. V. (2008). Electrical networks. Pume: Technical Publications.
Google Scholar
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Belmont: Wadsworth.h.
MATH Google Scholar
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information and lexicography. Computational Linguistics, 16, 22–29.
Google Scholar
Chang, L. L. N., Phoa, F. K. H., & Nakano, J. (2019). A new metrics for the analysis of the scientific article citation network. IEEE Access, 7, 132027–132032.
Article Google Scholar
Dixon, S. J., & Brereton, R. G. (2009). Comparison of performance of five common classifiers represented as boundary methods: Euclidean distance to centroids, linear discriminant analysis, quadratic discriminant analysis, learning vector quantization and support vector machines, as dependent on data structure. Chemometrics and Intelligent Laboratory Systems, 95, 1–17.
Article Google Scholar
Deng, L., & Yu, D. (2014). Deep learning: Methods and applications. Foundations and Trends in Signal Processing, 7, 1–199.
Article MathSciNet Google Scholar
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaected by shift in position. Biological cybernetics, 36, 193–202.
Article Google Scholar
Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. London: Psychology Press.
Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
Article MathSciNet Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780.
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2015). Deep residual learning for image recognition. Preprint arXiv:1512.03385.
Ivakhnenko, A. G., Lapa, V. G., & McDonough, R. N. (1967). Cybernetics and forecasting techniques. New York: American Elsevier.
Google Scholar
Kononenko, I. (1993). Inductive and Bayesian learning in medical diagnosis. Applied Artificial Intelligence, 7, 331–337.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classication with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Google Scholar
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1, 541–551.
Article Google Scholar
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2, 396–404.
Google Scholar
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5, 115–133.
Article MathSciNet Google Scholar
Phoa, F. K. H., & Sanchez, J. (2013). Modeling the browsing behaviour of world wide web users. Open Journal of Statistics, 3, 145–154.
Article Google Scholar
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–408.
Article Google Scholar
Roder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic conherence measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM, 15, 399–408.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).
Simonyan, K. & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Preprint arXiv:1409.1556.
Wang, T. C., & Phoa, F. K. H. (2016). A scanning method for detecting clustering pattern of both attribute and structure in social networks. Physica A, 445, 295–309.
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank Clarivate Analytics to provide the access to the raw data of the Web of Science database for research investigations. They also thank the URA team of ISM for transforming the data into neo4j database and providing the neo4j database for analysis in this work. In addition, they would like to thank Ms. Ula Tzu-Ning Kung to provide English editing service in this paper, and Ms. Ashwini Balaji Barve to provide some background information on deep learning. This work was supported by Academia Sinica Grant Number AS-TP-109-M07 and the Ministry of Science and Technology (Taiwan) Grant Numbers 107-2118-M-001-011-MY3, 107-2321-B-001-038 and 108-2321-B-001-016.

Author information

Authors and Affiliations

Institute of Statistical Science, Academia Sinica, Taipei City, 11529, Taiwan
Frederick Kin Hing Phoa
Institute of Statistics, National Chiao Tung University, Hsinchu, 30010, Taiwan
Hsin-Yi Lai
SOKENDAI (The Graduate University for Advanced Studies), Tokyo, 190-8562, Japan
Livia Lin-Hsuan Chang
Institute of Statistical Mathematics, Tokyo, 190-8562, Japan
Keisuke Honda

Authors

Frederick Kin Hing Phoa
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Yi Lai
View author publications
You can also search for this author in PubMed Google Scholar
Livia Lin-Hsuan Chang
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Honda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frederick Kin Hing Phoa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phoa, F.K.H., Lai, HY., Chang, L.LH. et al. A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science. Scientometrics 125, 851–863 (2020). https://doi.org/10.1007/s11192-020-03599-y

Download citation

Received: 16 December 2019
Published: 17 August 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11192-020-03599-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A two-step deep learning approach to data classification and modeling and a demonstration on subject type relationship analysis in the Web of Science

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation