Combining Node Identifier Features and Community Priors for Within-Network Classification

Ye, Qi; Zhu, Changlei; Li, Gang; Wang, Feng

doi:10.1007/978-3-319-63564-4_1

Qi Ye¹⁸,
Changlei Zhu¹⁸,
Gang Li¹⁸ &
…
Feng Wang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10367))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data

1723 Accesses
1 Citations

Abstract

With widely available large-scale network data, one hot topic is how to adopt traditional classification algorithms to predict the most probable labels of nodes in a partially labeled network. In this paper, we propose a new algorithm called identifier based relational neighbor classifier (IDRN) to solve the within-network multi-label classification problem. We use the node identifiers in the egocentric networks as features and propose a within-network classification model by incorporating community structure information to predict the most probable classes for unlabeled nodes. We demonstrate the effectiveness of our approach on several publicly available datasets. On average, our approach can provide Hamming score, Micro-\(\text {F}_1\) score and Macro-\(\text {F}_1\) score up to 14%, 21% and 14% higher than competing methods respectively in sparsely labeled networks. The experiment results show that our approach is quite efficient and suitable for large-scale real-world classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.: Distributed large-scale natural graph factorization. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 37–48 (2013)
Google Scholar
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet MATH Google Scholar
Bhagat, S., Cormode, G., Muthukrishnan, S.: Node classification in social networks. CoRR, abs/1101.3291 (2011)
Google Scholar
Bian, J., Chang, Y.: A taxonomy of local search: semi-supervised query classification driven by information needs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 2425–2428 (2011)
Google Scholar
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 10, 10008 (2008)
Article Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Article MathSciNet Google Scholar
Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)
Article MathSciNet Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet MATH Google Scholar
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Jiang, S., Hu, Y., et al.: Learning query and document relevance from a web-scale click graph. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 185–194 (2016)
Google Scholar
Joulin, A., Grave, E., et al.: Bag of tricks for efficient text classification. CoRR, abs/1607.01759 (2016)
Google Scholar
Macskassy, S.A., Provost, F.: A simple relational classifier. In: Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003, pp. 64–76 (2003)
Google Scholar
Macskassy, S.A., Provost, F.: Classification in networked data: a toolkit and a univariate case study. J. Mach. Learn. Res. 8(May), 935–983 (2007)
Google Scholar
Marsden, P.V.: Egocentric and sociocentric measures of network centrality. Soc. Netw. 24(4), 407–422 (2002)
Article Google Scholar
McDowell, L.K., Aha, D.W.: Labels or attributes? Rethinking the neighbors for collective classification in sparsely-labeled networks. In: International Conference on Information and Knowledge Management, pp. 847–852 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR, abs/1301.3781 (2013)
Google Scholar
Murphy, K.P., Learning, M.: A Probabilistic Perspective. The MIT Press, Cambridge (2012)
Google Scholar
Nandanwar, S., Murty, M.N.: Structural neighborhood based classification of nodes in a network. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1085–1094 (2016)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Rayana, S., Akoglu, L.: Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 985–994 (2015)
Google Scholar
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line: large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077 (2015)
Google Scholar
Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 817–826 (2009)
Google Scholar
Tang, L., Liu, H.: Scalable learning of collective behavior based on sparse social dimensions. In: The 18th ACM Conference on Information and Knowledge Management, pp. 1107–1116 (2009)
Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234 (2016)
Google Scholar
Wang, S.I., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the ACL, pp. 90–94 (2012)
Google Scholar
Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of The 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 464–472 (2013)
Google Scholar
Yin, D., Hu, Y., et al.: Ranking relevance in yahoo search. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–332 (2016)
Google Scholar

Download references

Acknowledgments

The authors would like to thank all the members in ADRS (ADvertisement Research for Sponsered search) group in Sogou Inc. for the help with parts of the data processing and experiments.

Author information

Authors and Affiliations

Sogou Inc., Beijing, China
Qi Ye, Changlei Zhu, Gang Li & Feng Wang

Authors

Qi Ye
View author publications
You can also search for this author in PubMed Google Scholar
Changlei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Ye .

Editor information

Editors and Affiliations

Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Lei Chen
Computer Science, Aarhus University, Aarhus N, Denmark
Christian S. Jensen
Computer Science, University of Southern California, Los Angeles, California, USA
Cyrus Shahabi
Northeastern University , Shenyang, China
Xiaochun Yang
Kent State University, Kent, Ohio, USA
Xiang Lian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, Q., Zhu, C., Li, G., Wang, F. (2017). Combining Node Identifier Features and Community Priors for Within-Network Classification. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10367. Springer, Cham. https://doi.org/10.1007/978-3-319-63564-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63564-4_1
Published: 03 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63563-7
Online ISBN: 978-3-319-63564-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics