Skip to main content
Log in

New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Outlier detection is a crucial task for network data analysis, which identifies abnormal entities that deviate from the rest of the dataset. Ranking in outlierness is often used for identifying abnormal nodes in directed citation networks containing citation relationship among nodes. A challenging issue in outlier ranking is how to leverage the rich graph data of complex citation networks. In this paper, we propose a cluster-based outlier score function to identify outliers in citation networks based on nonnegative matrix factorization (NMF). We first represent the citation data as a directed graph, and cluster the directed graph into logical groupings of nodes using NMF. Based on the clustering results, we obtain the outlier score and ranking for each node using the proposed outlier scoring function. The proposed method leverages the direct and indirect citation links between nodes to measure the graph-based outlierness. We validate the proposed outlier ranking method using small artificial dataset and the real-world U.S. patent data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agreste, S., De Meo, P., Ferrara, E., Piccolo, S., & Provetti, A. (2015). Analysis of a heterogeneous social network of humans and cultural objects. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(4), 559–570.

    Article  Google Scholar 

  • Akoglu, L., McGlohon, M., & Faloutsos, C. (2010). Oddball: Spotting anomalies in weighted graphs. In Pacific-Asia conference on knowledge discovery and data Mining (pp. 410–421). Berlin: Springer

  • Banker, R. D., Chang, H., & Zheng, Z. (2017). On the use of super-efficiency procedures for ranking efficient units and identifying outliers. Annals of Operations Research, 250(1), 21–35.

    Article  Google Scholar 

  • Boutsidis, C., & Gallopoulos, E. (2008). SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 41(4), 1350–1362.

    Article  Google Scholar 

  • Cao, X., Wang, X., Jin, D., Cao, Y., & He, D. (2013). Identifying overlapping communities as well as hubs and outliers via nonnegative matrix factorization. Scientific Reports, 3, 2993.

    Article  Google Scholar 

  • Codetta-Raiteri, D., & Portinale, L. (2015). Dynamic bayesian networks for fault detection, identification, and recovery in autonomous spacecraft. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(1), 13–24.

    Article  Google Scholar 

  • Ding, C. H., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. SDM, SIAM, 5, 606–610.

    Google Scholar 

  • Duan, L., Xu, L., Liu, Y., & Lee, J. (2009). Cluster-based outlier detection. Annals of Operations Research, 168(1), 151–168.

    Article  Google Scholar 

  • Džamić, D., Aloise, D., & Mladenović, N. (2017). Ascent-descent variable neighborhood decomposition search for community detection by modularity maximization. Annals of Operations Research, 272, 273–287.

    Article  Google Scholar 

  • Holder, L. B., & Cook, D. J. (2009). Graph-based data mining. Encyclopedia of data warehousing and mining, 2, 943–949.

    Article  Google Scholar 

  • Kaffash, S., & Marra, M. (2017). Data envelopment analysis in financial services: A citations network analysis of banks, insurance companies and money market funds. Annals of Operations Research, 253(1), 307–344.

    Article  Google Scholar 

  • Kang, U., Akoglu, L., & Chau, D. H. P. (2013). Big graph mining: Algorithms, anomaly detection, and applications. Proceedings of the ACM ASONAM, 13, 25–28.

    Google Scholar 

  • Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562).

  • Lu, N., Li, T., Pan, J., Ren, X., Feng, Z., & Miao, H. (2015). Structure constrained semi-nonnegative matrix factorization for EEG-based motor imagery classification. Computers in Biology and Medicine, 60, 32–39.

    Article  Google Scholar 

  • Ma, Y., Hu, X., He, T., & Jiang, X. (2016). Hessian regularization based symmetric nonnegative matrix factorization for clustering gene expression and microbiome data. Methods, 111, 80–84.

    Article  Google Scholar 

  • Michel, J., & Bettels, B. (2001). Patent citation analysis. A closer look at the basic input data from patent search reports. Scientometrics, 51(1), 185–201.

    Article  Google Scholar 

  • Moonesignhe, H., & Tan, P. N. (2006). Outlier detection using random walks. In 2006 18th IEEE international conference on tools with artificial intelligence (ICTAI’06), IEEE (pp. 532–539).

  • Newman, M. (2010). Networks: An introduction. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Sun, H,. Huang, J., Han, J., Deng, H., Zhao, P., & Feng, B. (2010). gskeletonclu: Density-based network clustering via structure-connected tree division or agglomeration. In 2010 IEEE International Conference on Data Mining, IEEE (pp. 481–490).

  • Tong, H., & Lin, C. Y. (2011). Non-negative residual matrix factorization with application to graph anomaly detection. In SDM, SIAM (pp. 143–153).

  • Tosyali, A., Kim, J., Choi, J., & Jeong, M. K. (2019). Regularized asymmetric nonnegative matrix factorization for clustering in directed networks. Pattern Recognition Letters, 125, 750–757.

    Article  Google Scholar 

  • Wang, F., Li, T., Wang, X., Zhu, S., & Ding, C. (2011). Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 22(3), 493–521.

    Article  Google Scholar 

  • Xu, X., Yuruk, N., Feng, Z., & Schweiger, T. A. (2007). Scan: A structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM (pp. 824–833).

  • Yoon, J., & Kim, K. (2011). Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics, 90(2), 445–461.

    Article  Google Scholar 

  • Yuan, X., Guo, J., Hao, X., & Chen, H. (2015). Traffic sign detection via graph-based ranking and segmentation algorithms. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(12), 1509–1521.

    Article  Google Scholar 

  • Zhi, R., Flierl, M., Ruan, Q., & Kleijn, W. B. (2011). Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1), 38–52.

    Article  Google Scholar 

  • Zou, Z., Li, J., Gao, H., & Zhang, S. (2010). Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 22(9), 1203–1218.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myong K. Jeong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of proposition 1

Appendix: Proof of proposition 1

Suppose that a graph G is an acyclic directed graph with n nodes, then the length of the longest path in the graph is at most \(n-1\). Let \(\tau \) be the longest path in the graph G. For \(l>\tau \) and for all \(1\le i\), \(j\le n\), \([\mathbf A ]_{ij}^{(l)}=0\) since there is no path with length l from node i to node j. Therefore \(\mathbf A ^l=\mathbf 0 \), for all \(l>\tau \). Let \(\mathbf C =\sum _{l=1}^\infty \beta ^l \mathbf A ^l\), then \(\mathbf C =\sum _{l=1}^\tau \beta ^l \mathbf A ^l\). With \(0<\beta <1\), C can be rewritten as

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tosyali, A., Kim, J., Choi, J. et al. New node anomaly detection algorithm based on nonnegative matrix factorization for directed citation networks. Ann Oper Res 288, 457–474 (2020). https://doi.org/10.1007/s10479-019-03508-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-019-03508-4

Keywords

Navigation