skip to main content
10.1145/2556195.2556223acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Active learning for networked data based on non-progressive diffusion model

Published: 24 February 2014 Publication History

Abstract

We study the problem of active learning for networked data, where samples are connected with links and their labels are correlated with each other. We particularly focus on the setting of using the probabilistic graphical model to model the networked data, due to its effectiveness in capturing the dependency between labels of linked samples. We propose a novel idea of connecting the graphical model to the information diffusion process, and precisely define the active learning problem based on the non-progressive diffusion model. We show the NP-hardness of the problem and propose a method called MaxCo to solve it. We derive the lower bound for the optimal solution for the active learning setting, and develop an iterative greedy algorithm with provable approximation guarantees. We also theoretically prove the convergence and correctness of MaxCo.
We evaluate MaxCo on four different genres of datasets: Coauthor, Slashdot, Mobile, and Enron. Our experiments show a consistent improvement over other competing approaches.

References

[1]
A. Arasu, M. Götz, and R. Kaushik. On active learning of record matching packages. In SIGMOD'10, pages 783--794, 2010.
[2]
M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. ICML, pages 79--86, 2010.
[3]
C. Chang and Y. Lyuu. Spreading messages. Theor Comput Sci, 410:2714--2724, 2009.
[4]
W. Chen, C. Wang, and Y. Wang. Scalable inuence maximization for prevalent viral marketing in large-scale social networks. In KDD'10, pages 1029--1038, 2010.
[5]
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. J. Artif. Int. Res., 4(1):129--145, Mar. 1996.
[6]
Z. Dezsö and A. Barabási. Halting viruses in scale-free networks. Phys Rev, 2002.
[7]
P. Domingos and M. Richardson. Mining the network value of customers. In KDD'01, pages 57--66, 2001.
[8]
M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V.Mirrokni, and S. S. Sadeghabad. On the non-progressive spread of inuence through social networks. LATIN 2012: Theoretical Informatics, 7256:315--326, 2012.
[9]
L. Freeman. The development of social network analysis. Empirical Press Vancouver, British Columbia, 2004.
[10]
D. Golovin, A. Krause, and D. Ray. Near-optimal bayesian active learning with noisy observations. CoRR, 2010.
[11]
A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning inuence probabilities in social networks. In WSDM'10, pages 241--250, 2010.
[12]
S. C. H. Hoi, R. Jin, and M. R. Lyu. Large-scale text categorization by batch mode active learning. In WWW'06, pages 633--642, 2006.
[13]
D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of inuence through a social network. In KDD'03, pages 137--146, 2003.
[14]
M. Kimura, K. Satio, R. Nakano, and H. Motoda. Extracting inuential nodes on a social network for information diffusion. DMKD, 20:70--97, 2010.
[15]
A. Kuwadekar and J. Neville. Relational active learning for joint collective classification models. In ICML'11, pages 385--392, 2011.
[16]
O. Martinez and G. Tsechpenakis. Integration of active learning in a collaborative crf. In CVPRW'08, pages 1--8, 2008.
[17]
K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In UAI'99, pages 467--475, 1999.
[18]
R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. Phys Rev, 65, 2002.
[19]
M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In KDD'02, pages 61--70, 2002.
[20]
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. ICML, pages 441--448, 2001.
[21]
T. Scheffer, C. Decomain, and S. Wrobel. Active hidden markov models for information extraction. In CAIDA'01, pages 309--318, 2001.
[22]
B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In EMNLP, pages 1070--1079, 2008.
[23]
L. Shi, Y. Zhao, and J. Tang. Batch mode active learning for networked data. ACM Transactions on Intelligent Systems and Technology (TIST), 2011.
[24]
Siegel and Sidney. Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.
[25]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social inuence analysis in large-scale networks. In KDD'09, pages 807--816, 2009.
[26]
J. Tang, S. Wu, and J. Sun. Conuence: Conformity inuence in large social networks. In KDD'13, pages 347--355, 2013.
[27]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD'08, pages 990--998, 2008.
[28]
W. Tang, H. Zhuang, and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD'11, pages 381--397, 2011.
[29]
M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2):1--305, Jan. 2008.
[30]
D. Wilson. Levels of selection: An alternative to individualism in biology and the human sciences. Soc Networks, 11:257--272, 1989.
[31]
H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin, and X. Wang. Actively learning to infer social ties. Data Mining and Knowledge Discovery, 25(2):270--297, 2012.

Cited By

View all
  • (2015)Relational active learning for link-based classification2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344798(1-10)Online publication date: Oct-2015
  • (2015)Batch Mode Active Learning for Networked Data with Optimal Subset SelectionWeb-Age Information Management10.1007/978-3-319-21042-1_8(96-108)Online publication date: 6-Jun-2015
  • (2014)Active Learning for Streaming Networked DataProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661981(1129-1138)Online publication date: 3-Nov-2014

Index Terms

  1. Active learning for networked data based on non-progressive diffusion model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
      February 2014
      712 pages
      ISBN:9781450323512
      DOI:10.1145/2556195
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 February 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. active learning
      2. factor graph model
      3. non-progressive model

      Qualifiers

      • Research-article

      Conference

      WSDM 2014

      Acceptance Rates

      WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
      Overall Acceptance Rate 498 of 2,863 submissions, 17%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Relational active learning for link-based classification2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344798(1-10)Online publication date: Oct-2015
      • (2015)Batch Mode Active Learning for Networked Data with Optimal Subset SelectionWeb-Age Information Management10.1007/978-3-319-21042-1_8(96-108)Online publication date: 6-Jun-2015
      • (2014)Active Learning for Streaming Networked DataProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661981(1129-1138)Online publication date: 3-Nov-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media