Skip to main content

Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

  • Conference paper
  • First Online:
  • 1509 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11233))

Abstract

Multi-label classification of social network data has become an important problem. Two types of information have been used to classify nodes in a social network: characteristics of nodes, and the connectivity between nodes. Existing classification methods can be categorized to two types too, feature based methods, and connectivity based methods. We observe that there are no one size fits all classification methods, since the performance is data dependent, but in general node’s class labels are determined by two factors, personal preference and peer influence. However, some data sets are personal preference dominated and are suitable for feature based methods, whereas some data sets are peer influence dominated and are suitable for connectivity based methods. The challenge then is how to judge if a data set is personal preference dominated or peer influence dominated, so a suitable classification method can be selected for its classification. In this paper, we develop a causality based criterion to determine the characteristics of a data set. Experiments on real-world data sets demonstrate the criterion can predict the suitability of a classification method for a data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://ial.eecs.ucf.edu/Data/SCRN-Data.zip.

  2. 2.

    http://leitang.net/social_dimension.html.

References

  1. Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 817–826 (2009)

    Google Scholar 

  2. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  3. Macskassy, S., Provost, F.: A simple relational classifier. In: Proceedings of the Second Workshop on Multi-Relational Data Mining at 9th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 64–76 (2003)

    Google Scholar 

  4. Nandanwar, S., Murty, M.N.: Structural neighborhood based classification of nodes in a network. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 1085–1094 (2016)

    Google Scholar 

  5. Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 464–472 (2013)

    Google Scholar 

  6. Aral, S., Muchnik, L., Sundararajan, A.: Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl. Acad. Sci. 106, 21544–21549 (2009)

    Article  Google Scholar 

  7. McCallum, A.K.: Multi-label text classification with a mixture model trained by EM. In: Working Notes of the AAAI 1999 Workshop on Text Learning, pp. 1–7 (1999)

    Google Scholar 

  8. Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  9. Chen, Z., Chi, Z., Fu, H., Feng, D.: Multi-instance multi-label image classification: a neural approach. Neurocomputing 99, 298–306 (2013)

    Article  Google Scholar 

  10. Zhao, K., Zhang, H., Ma, Z., Song, Y., Guo, J.: Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing 157, 280–289 (2015)

    Article  Google Scholar 

  11. Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Inf. Sci. 179(19), 3218–3229 (2009)

    Article  Google Scholar 

  12. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)

    Article  Google Scholar 

  13. Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the Workshop on Learning Statistical Models from Relational Data at the 17th AAAI National Conference on Artificial Intelligence, pp. 42–49 (2000)

    Google Scholar 

  14. Heatherly, R., Kantarcioglu, M., Li, X.: Social network classification incorporating link type. In: Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 19–24 (2009)

    Google Scholar 

  15. Lin, F., Cohen, W.W.: Semi-supervised classification of network data using very few labels. In: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, pp. 192–199 (2010)

    Google Scholar 

  16. Rubin, D.: Comment of D. Basu, Randomization analysis of experimental data: the Fisher randomization test. J. Am. Stat. Assoc. 75, 591–593 (1980)

    Google Scholar 

  17. Rubin, D.: Comment of Neyman (1923) and causal inference in experiments and observational studies. Stat. Sci. 5, 472–480 (1990)

    Article  Google Scholar 

  18. Rubin, D.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100, 322–331 (2005)

    Article  Google Scholar 

  19. Sekhon, J.S.: The Neyman–Rubin model of causal inference and estimation via matching methods. In: The Oxford Handbook of Political Methodology (2007)

    Google Scholar 

  20. Rosenbaum, P.R., Rubin, D.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)

    Article  MathSciNet  Google Scholar 

  21. Fan, R., Lin, C.: A study on threshold selection for multi-label classification. Technical report, National Taiwan University (2007)

    Google Scholar 

  22. Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

  23. Daniel, H., Imai, K., King, G., Stuart, E.: MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42, 1–28 (2011)

    Google Scholar 

  24. Hirano, K., Imbens, G., Ridder, G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189 (2003)

    Article  MathSciNet  Google Scholar 

  25. Salemi, B., Noah, S., Aziz, M.: Rfboost: an improved multi-label boosting algorithm and its application to text categorization. Knowl.-Based Syst. 103, 104–117 (2016)

    Article  Google Scholar 

  26. Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multi-label classification of music into emotions. In: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 325–330 (2008)

    Google Scholar 

  27. Sanden, C., Zhang, J.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714 (2011)

    Google Scholar 

  28. Tang, L., Wang, X., Liu, H.: Scalable learning of collective behavior. IEEE Trans. Knowl. Data Eng. 24(6), 1080–1091 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Z., Li, J., Wang, H., Liu, L., Liu, J. (2018). Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11233. Springer, Cham. https://doi.org/10.1007/978-3-030-02922-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02922-7_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02921-0

  • Online ISBN: 978-3-030-02922-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics