Skip to main content

NOCOL - Nonnegative Orthogonal Constraint Outlier Learning

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2021 (WISE 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13081))

Included in the following conference series:

  • 1065 Accesses

Abstract

Identifying anomalous documents in a text corpus is an important problem that has wide applications. Due to the high dimensional and sparse nature of text data, traditional outlier detection methods fail to identify features that distinguish outliers. Inspired by the capability of Nonnegative Matrix Factorization (NMF) for text clustering, we explore it for text outlier detection. In this paper, a novel NMF-based method called Nonnegative Orthogonal Constraint Outlier Learning (NOCOL) is introduced that learns the outliers effectively during the factorization process. Experimental results show the higher accuracy of NOCOL in identifying text outliers in comparison to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/thirubs/NOCOL.

References

  1. Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8

    Chapter  Google Scholar 

  2. Allan, E.G., Horvath, M.R., Kopek, C.V., Lamb, B.T., Whaples, T.S., Berry, M.W.: Anomaly detection using nonnegative matrix factorization. In: Survey of Text Mining II, pp. 203–217. Springer, Heidelberg (2008). https://doi.org/10.1007/978-1-84800-046-9_11

  3. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. ACM Sigmod Rec. 29(2), 93–104 (2000)

    Article  Google Scholar 

  4. Choi, S.: Algorithms for orthogonal nonnegative matrix factorization. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1828–1832. IEEE (2008)

    Google Scholar 

  5. Dong, X.L., Srivastava, D.: Big data integration. In: ICDE, pp. 1245–1248. IEEE (2013)

    Google Scholar 

  6. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM, pp. 47–58. SIAM (2003)

    Google Scholar 

  7. Gokcesu, K., Neyshabouri, M.M., Gokcesu, H., Kozat, S.S.: Sequential outlier detection based on incremental decision trees. IEEE Trans. Signal Process. 67(4), 993–1005 (2018)

    Article  MathSciNet  Google Scholar 

  8. Jackson, D.A., Chen, Y.: Robust principal component analysis and outlier detection with ecological data. Environmetrics Off. J. Int. Environmetrics Soc. 15(2), 129–139 (2004)

    Google Scholar 

  9. Kannan, R., Woo, H., Aggarwal, C.C., Park, H.: Outlier detection for text data: an extended version (2017). arXiv preprint arXiv:1701.01325

  10. Li, T., Ding, C.c.: Nonnegative matrix factorizations for clustering: a survey. In: Data Clustering, pp. 149–176. Chapman and Hall/CRC (2013)

    Google Scholar 

  11. Liu, H., Li, X., Li, J., Zhang, S.: Efficient outlier detection for high-dimensional data. IEEE Trans. Syst. Man Cybern. Syst. 48, 2451–2461 (2017)

    Article  Google Scholar 

  12. Liu, Y., et al.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. 32(8), 1517–1528 (2020)

    Article  Google Scholar 

  13. McGill, R., Tukey, J.W., Larsen, W.A.: Variations of box plots. Am. Stat. 32(1), 12–16 (1978)

    Google Scholar 

  14. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  15. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. ACM Sigmod Rec. 29(2), 427–438 (2000)

    Article  Google Scholar 

  16. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  17. Wang, C., Liu, Z., Gao, H., Fu, Y.: Vos: a new outlier detection model using virtual graph. Knowl.-Based Syst. 185, 104907 (2019)

    Article  Google Scholar 

  18. Wang, H., Bah, M.J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 107964–108000 (2019)

    Article  Google Scholar 

  19. Wang, X., Zheng, Q., Zheng, K., Sui, Y., Cao, S., Shi, Y.: Detecting social media bots with variational autoencoder and k-nearest neighbor. Appl. Sci. 11(12), 5482 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thirunavukarasu Balasubramaniam .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 225 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Balasubramaniam, T., Mohotti, W.A., Nayak, R., Yuen, C. (2021). NOCOL - Nonnegative Orthogonal Constraint Outlier Learning. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91560-5_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91559-9

  • Online ISBN: 978-3-030-91560-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics