skip to main content
10.1145/2661829.2661981acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Active Learning for Streaming Networked Data

Published: 03 November 2014 Publication History

Abstract

Mining high-speed data streams has become an important topic due to the rapid growth of online data. In this paper, we study the problem of active learning for streaming networked data. The goal is to train an accurate model for classifying networked data that arrives in a streaming manner by querying as few labels as possible. The problem is extremely challenging, as both the data distribution and the network structure may change over time. The query decision has to be made for each data instance sequentially, by considering the dynamic network structure.
We propose a novel streaming active query strategy based on structural variability. We prove that by querying labels we can monotonically decrease the structural variability and better adapt to concept drift. To speed up the learning process, we present a network sampling algorithm to sample instances from the data stream, which provides a way for us to handle large volume of streaming data. We evaluate the proposed approach on four datasets of different genres: Weibo, Slashdot, IMDB, and ArnetMiner. Experimental results show that our model performs much better (+5-10% by F1-score on average) than several alternative methods for active learning over streaming networked data.

References

[1]
N. K. Ahmed, J. Neville, and R. R. Kompella. Network sampling: From static to streaming graphs. CoRR, 2012.
[2]
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439), 1999.
[3]
M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. In ICML, 2010.
[4]
N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella. Active learning on trees and graphs. In COLT, 2010.
[5]
Y. Cheng, Z. Chen, L. Liu, J. Wang, A. Agrawal, and A. N. Choudhary. Feedback-driven multiclass active learning for data streams. In CIKM, 2013.
[6]
W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. L. Tseng. Unbiased online active learning in data streams. In KDD, 2011.
[7]
L. Getoor and A. Machanavajjhala. Network sampling. In KDD, 2013.
[8]
Q. Gu, C. Aggarwal, J. Liu, and J. Han. Selective sampling on graphs for classification. In KDD, 2013.
[9]
J. M. Hammersley and P. E. Clifford. Markov random fields on finite graphs and lattices. Unpublished manuscript, 1971.
[10]
M. Ji and J. Han. A variance minimization criterion to active learning on graphs. In AISTATS, 2012.
[11]
R. Kindermann, J. L. Snell, et al. Markov random fields and their applications. Amer Mathematical Society, 1980.
[12]
N. Komodakis. Efficient training for pairwise or higher order crfs via dual decomposition. In CVPR, 2011.
[13]
N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition. IEEE Trans. Pattern Anal. Mach. Intell., 2011.
[14]
J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.
[15]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, 2006.
[16]
K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In UAI, 1999.
[17]
D. Sontag, A. Globerson, and T. Jaakkola. Introduction to dual decomposition for inference. Optimization for Machine Learning, 1, 2011.
[18]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009.
[19]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD, pages 990--998, 2008.
[20]
B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.
[21]
M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2), 2008.
[22]
X. Wang, R. Garnett, and J. Schneider. Active search on graphs. In KDD, 2013.
[23]
Z. Wang and J. Ye. Querying discriminative and representative samples for batch mode active learning. In KDD, 2013.
[24]
E. P. Xing, M. I. Jordan, and S. Russell. A generalized mean field algorithm for variational inference in exponential families. In UAI, 2003.
[25]
Z. Yang, J. Tang, B. Xu, and C. Xing. Active learning for networked data based on non-progressive diffusion model. In WSDM, 2014.
[26]
J. Zhang, B. Liu, J. Tang, T. Chen, and J. Li. Social influence locality for modeling retweeting behaviors. In IJCAI, 2013.
[27]
X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
[28]
X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from data streams. In ICDM, 2007.
[29]
I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In ECML/PKDD (3), 2011.

Cited By

View all
  • (2022)Research on Active Sampling with Self-supervised ModelBig Data and Security10.1007/978-981-19-0852-1_54(683-695)Online publication date: 10-Mar-2022
  • (2021)ALPINE: Active Link Prediction Using Network EmbeddingApplied Sciences10.3390/app1111504311:11(5043)Online publication date: 29-May-2021
  • (2021)Active learning for imbalanced data under cold startProceedings of the Second ACM International Conference on AI in Finance10.1145/3490354.3494423(1-9)Online publication date: 3-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. data streams
  3. network sampling

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '14
Sponsor:

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Research on Active Sampling with Self-supervised ModelBig Data and Security10.1007/978-981-19-0852-1_54(683-695)Online publication date: 10-Mar-2022
  • (2021)ALPINE: Active Link Prediction Using Network EmbeddingApplied Sciences10.3390/app1111504311:11(5043)Online publication date: 29-May-2021
  • (2021)Active learning for imbalanced data under cold startProceedings of the Second ACM International Conference on AI in Finance10.1145/3490354.3494423(1-9)Online publication date: 3-Nov-2021
  • (2021)Which Node Pair and What Status? Asking Expert for Better Network EmbeddingDatabase Systems for Advanced Applications10.1007/978-3-030-73194-6_11(141-157)Online publication date: 11-Apr-2021
  • (2020)ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property PredictionProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403117(731-752)Online publication date: 23-Aug-2020
  • (2018)Within-Network Classification in Temporal Graphs2018 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2018.00041(229-236)Online publication date: Nov-2018
  • (2016)Classification in dynamic streaming networksProceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.5555/3192424.3192449(138-145)Online publication date: 18-Aug-2016
  • (2016)Classification in dynamic streaming networks2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)10.1109/ASONAM.2016.7752225(138-145)Online publication date: Aug-2016
  • (2016)On improving performance of surface inspection systems by online active learning and flexible classifier updatesMachine Vision and Applications10.1007/s00138-015-0731-927:1(103-127)Online publication date: 1-Jan-2016
  • (2015)A Min-Max Optimization Framework For Online Graph ClassificationProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806548(643-652)Online publication date: 17-Oct-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media