Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

Zhang, Zan; Li, Jiuyong; Wang, Hao; Liu, Lin; Liu, Jixue

doi:10.1007/978-3-030-02922-7_25

Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

Zan Zhang^18,19,
Jiuyong Li¹⁹,
Hao Wang¹⁸,
Lin Liu¹⁹ &
…
Jixue Liu¹⁹

Conference paper
First Online: 20 October 2018

1509 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11233))

Abstract

Multi-label classification of social network data has become an important problem. Two types of information have been used to classify nodes in a social network: characteristics of nodes, and the connectivity between nodes. Existing classification methods can be categorized to two types too, feature based methods, and connectivity based methods. We observe that there are no one size fits all classification methods, since the performance is data dependent, but in general node’s class labels are determined by two factors, personal preference and peer influence. However, some data sets are personal preference dominated and are suitable for feature based methods, whereas some data sets are peer influence dominated and are suitable for connectivity based methods. The challenge then is how to judge if a data set is personal preference dominated or peer influence dominated, so a suitable classification method can be selected for its classification. In this paper, we develop a causality based criterion to determine the characteristics of a data set. Experiments on real-world data sets demonstrate the criterion can predict the suitability of a classification method for a data set.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 817–826 (2009)
Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Article Google Scholar
Macskassy, S., Provost, F.: A simple relational classifier. In: Proceedings of the Second Workshop on Multi-Relational Data Mining at 9th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 64–76 (2003)
Google Scholar
Nandanwar, S., Murty, M.N.: Structural neighborhood based classification of nodes in a network. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 1085–1094 (2016)
Google Scholar
Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 464–472 (2013)
Google Scholar
Aral, S., Muchnik, L., Sundararajan, A.: Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl. Acad. Sci. 106, 21544–21549 (2009)
Article Google Scholar
McCallum, A.K.: Multi-label text classification with a mixture model trained by EM. In: Working Notes of the AAAI 1999 Workshop on Text Learning, pp. 1–7 (1999)
Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)
Article Google Scholar
Chen, Z., Chi, Z., Fu, H., Feng, D.: Multi-instance multi-label image classification: a neural approach. Neurocomputing 99, 298–306 (2013)
Article Google Scholar
Zhao, K., Zhang, H., Ma, Z., Song, Y., Guo, J.: Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing 157, 280–289 (2015)
Article Google Scholar
Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Inf. Sci. 179(19), 3218–3229 (2009)
Article Google Scholar
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)
Article Google Scholar
Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the Workshop on Learning Statistical Models from Relational Data at the 17th AAAI National Conference on Artificial Intelligence, pp. 42–49 (2000)
Google Scholar
Heatherly, R., Kantarcioglu, M., Li, X.: Social network classification incorporating link type. In: Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 19–24 (2009)
Google Scholar
Lin, F., Cohen, W.W.: Semi-supervised classification of network data using very few labels. In: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, pp. 192–199 (2010)
Google Scholar
Rubin, D.: Comment of D. Basu, Randomization analysis of experimental data: the Fisher randomization test. J. Am. Stat. Assoc. 75, 591–593 (1980)
Google Scholar
Rubin, D.: Comment of Neyman (1923) and causal inference in experiments and observational studies. Stat. Sci. 5, 472–480 (1990)
Article Google Scholar
Rubin, D.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100, 322–331 (2005)
Article Google Scholar
Sekhon, J.S.: The Neyman–Rubin model of causal inference and estimation via matching methods. In: The Oxford Handbook of Political Methodology (2007)
Google Scholar
Rosenbaum, P.R., Rubin, D.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)
Article MathSciNet Google Scholar
Fan, R., Lin, C.: A study on threshold selection for multi-label classification. Technical report, National Taiwan University (2007)
Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Article Google Scholar
Daniel, H., Imai, K., King, G., Stuart, E.: MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42, 1–28 (2011)
Google Scholar
Hirano, K., Imbens, G., Ridder, G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189 (2003)
Article MathSciNet Google Scholar
Salemi, B., Noah, S., Aziz, M.: Rfboost: an improved multi-label boosting algorithm and its application to text categorization. Knowl.-Based Syst. 103, 104–117 (2016)
Article Google Scholar
Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multi-label classification of music into emotions. In: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 325–330 (2008)
Google Scholar
Sanden, C., Zhang, J.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714 (2011)
Google Scholar
Tang, L., Wang, X., Liu, H.: Scalable learning of collective behavior. IEEE Trans. Knowl. Data Eng. 24(6), 1080–1091 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, Anhui, China
Zan Zhang & Hao Wang
School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
Zan Zhang, Jiuyong Li, Lin Liu & Jixue Liu

Authors

Zan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiuyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jixue Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zan Zhang .

Editor information

Editors and Affiliations

Zayed University, Dubai, United Arab Emirates
Hakim Hacid
Poznan University of Economics, Poznan, Poland
Wojciech Cellary
University of Victoria, Footscray, VIC, Australia
Hua Wang
UNSW Australia, Sydney, NSW, Australia
Hye-Young Paik
Swinburne University of Technology, Hawthorn, VIC, Australia
Rui Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Li, J., Wang, H., Liu, L., Liu, J. (2018). Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?. In: Hacid, H., Cellary, W., Wang, H., Paik, HY., Zhou, R. (eds) Web Information Systems Engineering – WISE 2018. WISE 2018. Lecture Notes in Computer Science(), vol 11233. Springer, Cham. https://doi.org/10.1007/978-3-030-02922-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-02922-7_25
Published: 20 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02921-0
Online ISBN: 978-3-030-02922-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics