New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks

Tosyali, Ali; Kim, Jinho; Choi, Jeongsub; Kang, Yunyi; Jeong, Myong K.

doi:10.1007/s10479-019-03508-4

New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks

Original Research
Published: 02 January 2020

Volume 288, pages 457–474, (2020)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Ali Tosyali¹,
Jinho Kim²,
Jeongsub Choi³,
Yunyi Kang⁴ &
…
Myong K. Jeong ORCID: orcid.org/0000-0002-4124-5253⁵

683 Accesses
9 Citations
Explore all metrics

Abstract

Outlier detection is a crucial task for network data analysis, which identifies abnormal entities that deviate from the rest of the dataset. Ranking in outlierness is often used for identifying abnormal nodes in directed citation networks containing citation relationship among nodes. A challenging issue in outlier ranking is how to leverage the rich graph data of complex citation networks. In this paper, we propose a cluster-based outlier score function to identify outliers in citation networks based on nonnegative matrix factorization (NMF). We first represent the citation data as a directed graph, and cluster the directed graph into logical groupings of nodes using NMF. Based on the clustering results, we obtain the outlier score and ranking for each node using the proposed outlier scoring function. The proposed method leverages the direct and indirect citation links between nodes to measure the graph-based outlierness. We validate the proposed outlier ranking method using small artificial dataset and the real-world U.S. patent data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier edge detection using random graph generation models and applications

Article Open access 26 April 2017

Anomalous citations detection in academic networks

Article Open access 29 March 2024

Meta-path-based outlier detection in heterogeneous information network

Article 30 August 2019

References

Agreste, S., De Meo, P., Ferrara, E., Piccolo, S., & Provetti, A. (2015). Analysis of a heterogeneous social network of humans and cultural objects. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(4), 559–570.
Article Google Scholar
Akoglu, L., McGlohon, M., & Faloutsos, C. (2010). Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia conference on knowledge discovery and data Mining (pp. 410–421). Berlin: Springer
Banker, R. D., Chang, H., & Zheng, Z. (2017). On the use of super-efficiency procedures for ranking efficient units and identifying outliers. Annals of Operations Research, 250(1), 21–35.
Article Google Scholar
Boutsidis, C., & Gallopoulos, E. (2008). SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 41(4), 1350–1362.
Article Google Scholar
Cao, X., Wang, X., Jin, D., Cao, Y., & He, D. (2013). Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization. Scientific Reports, 3, 2993.
Article Google Scholar
Codetta-Raiteri, D., & Portinale, L. (2015). Dynamic bayesian networks for fault detection, identification, and recovery in autonomous spacecraft. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(1), 13–24.
Article Google Scholar
Ding, C. H., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. SDM, SIAM, 5, 606–610.
Google Scholar
Duan, L., Xu, L., Liu, Y., & Lee, J. (2009). Cluster-based outlier detection. Annals of Operations Research, 168(1), 151–168.
Article Google Scholar
Džamić, D., Aloise, D., & Mladenović, N. (2017). Ascent-descent variable neighborhood decomposition search for community detection by modularity maximization. Annals of Operations Research, 272, 273–287.
Article Google Scholar
Holder, L. B., & Cook, D. J. (2009). Graph-based data mining. Encyclopedia of data warehousing and mining, 2, 943–949.
Article Google Scholar
Kaffash, S., & Marra, M. (2017). Data envelopment analysis in financial services: A citations network analysis of banks, insurance companies and money market funds. Annals of Operations Research, 253(1), 307–344.
Article Google Scholar
Kang, U., Akoglu, L., & Chau, D. H. P. (2013). Big graph mining: Algorithms, anomaly detection, and applications. Proceedings of the ACM ASONAM, 13, 25–28.
Google Scholar
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562).
Lu, N., Li, T., Pan, J., Ren, X., Feng, Z., & Miao, H. (2015). Structure constrained semi-nonnegative matrix factorization for EEG-based motor imagery classification. Computers in Biology and Medicine, 60, 32–39.
Article Google Scholar
Ma, Y., Hu, X., He, T., & Jiang, X. (2016). Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data. Methods, 111, 80–84.
Article Google Scholar
Michel, J., & Bettels, B. (2001). Patent citation analysis. A closer look at the basic input data from patent search reports. Scientometrics, 51(1), 185–201.
Article Google Scholar
Moonesignhe, H., & Tan, P. N. (2006). Outlier detection using random walks. In 2006 18th IEEE international conference on tools with artificial intelligence (ICTAI’06), IEEE (pp. 532–539).
Newman, M. (2010). Networks: An introduction. Oxford: Oxford University Press.
Book Google Scholar
Sun, H,. Huang, J., Han, J., Deng, H., Zhao, P., & Feng, B. (2010). gskeletonclu: Density-based network clustering via structure-connected tree division or agglomeration. In 2010 IEEE International Conference on Data Mining, IEEE (pp. 481–490).
Tong, H., & Lin, C. Y. (2011). Non-negative residual matrix factorization with application to graph anomaly detection. In SDM, SIAM (pp. 143–153).
Tosyali, A., Kim, J., Choi, J., & Jeong, M. K. (2019). Regularized asymmetric nonnegative matrix factorization for clustering in directed networks. Pattern Recognition Letters, 125, 750–757.
Article Google Scholar
Wang, F., Li, T., Wang, X., Zhu, S., & Ding, C. (2011). Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3), 493–521.
Article Google Scholar
Xu, X., Yuruk, N., Feng, Z., & Schweiger, T. A. (2007). Scan: A structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (pp. 824–833).
Yoon, J., & Kim, K. (2011). Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics, 90(2), 445–461.
Article Google Scholar
Yuan, X., Guo, J., Hao, X., & Chen, H. (2015). Traffic sign detection via graph-based ranking and segmentation algorithms. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(12), 1509–1521.
Article Google Scholar
Zhi, R., Flierl, M., Ruan, Q., & Kleijn, W. B. (2011). Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1), 38–52.
Article Google Scholar
Zou, Z., Li, J., Gao, H., & Zhang, S. (2010). Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 22(9), 1203–1218.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Accounting and Management Information Systems, University of Delaware, Newark, DE, 19716, USA
Ali Tosyali
Rutgers Business School-Camden, Rutgers University, Camden, NJ, 08102, USA
Jinho Kim
Department of Industrial and Systems Engineering, Rutgers University, Piscataway, NJ, 08854, USA
Jeongsub Choi
School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ, 85281, USA
Yunyi Kang
Department of Industrial and Systems Engineering, DIMACS, and RUTCOR, Rutgers University, Piscataway, NJ, 08854, USA
Myong K. Jeong

Authors

Ali Tosyali
View author publications
You can also search for this author in PubMed Google Scholar
Jinho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jeongsub Choi
View author publications
You can also search for this author in PubMed Google Scholar
Yunyi Kang
View author publications
You can also search for this author in PubMed Google Scholar
Myong K. Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myong K. Jeong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of proposition 1

Suppose that a graph G is an acyclic directed graph with n nodes, then the length of the longest path in the graph is at most $n-1$. Let $\tau $ be the longest path in the graph G. For $l>\tau $ and for all $1\le i$, $j\le n$, $[\mathbf A ]_{ij}^{(l)}=0$ since there is no path with length l from node i to node j. Therefore $\mathbf A ^l=\mathbf 0 $, for all $l>\tau $. Let $\mathbf C =\sum _{l=1}^\infty \beta ^l \mathbf A ^l$, then $\mathbf C =\sum _{l=1}^\tau \beta ^l \mathbf A ^l$. With $0<\beta <1$, C can be rewritten as

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tosyali, A., Kim, J., Choi, J. et al. New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks. Ann Oper Res 288, 457–474 (2020). https://doi.org/10.1007/s10479-019-03508-4

Download citation

Published: 02 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10479-019-03508-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Outlier edge detection using random graph generation models and applications

Anomalous citations detection in academic networks

Meta-path-based outlier detection in heterogeneous information network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proof of proposition 1

Appendix: Proof of proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now