research-article

Active Learning for Streaming Networked Data

Authors:

Yutao ZhangAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1129 - 1138

https://doi.org/10.1145/2661829.2661981

Published: 03 November 2014 Publication History

Abstract

Mining high-speed data streams has become an important topic due to the rapid growth of online data. In this paper, we study the problem of active learning for streaming networked data. The goal is to train an accurate model for classifying networked data that arrives in a streaming manner by querying as few labels as possible. The problem is extremely challenging, as both the data distribution and the network structure may change over time. The query decision has to be made for each data instance sequentially, by considering the dynamic network structure.

We propose a novel streaming active query strategy based on structural variability. We prove that by querying labels we can monotonically decrease the structural variability and better adapt to concept drift. To speed up the learning process, we present a network sampling algorithm to sample instances from the data stream, which provides a way for us to handle large volume of streaming data. We evaluate the proposed approach on four datasets of different genres: Weibo, Slashdot, IMDB, and ArnetMiner. Experimental results show that our model performs much better (+5-10% by F1-score on average) than several alternative methods for active learning over streaming networked data.

References

[1]

N. K. Ahmed, J. Neville, and R. R. Kompella. Network sampling: From static to streaming graphs. CoRR, 2012.

[2]

A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439), 1999.

[3]

M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. In ICML, 2010.

[4]

N. Cesa-Bianchi, C. Gentile, F. Vitale, and G. Zappella. Active learning on trees and graphs. In COLT, 2010.

[5]

Y. Cheng, Z. Chen, L. Liu, J. Wang, A. Agrawal, and A. N. Choudhary. Feedback-driven multiclass active learning for data streams. In CIKM, 2013.

Digital Library

[6]

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. L. Tseng. Unbiased online active learning in data streams. In KDD, 2011.

Digital Library

[7]

L. Getoor and A. Machanavajjhala. Network sampling. In KDD, 2013.

Digital Library

[8]

Q. Gu, C. Aggarwal, J. Liu, and J. Han. Selective sampling on graphs for classification. In KDD, 2013.

Digital Library

[9]

J. M. Hammersley and P. E. Clifford. Markov random fields on finite graphs and lattices. Unpublished manuscript, 1971.

[10]

M. Ji and J. Han. A variance minimization criterion to active learning on graphs. In AISTATS, 2012.

[11]

R. Kindermann, J. L. Snell, et al. Markov random fields and their applications. Amer Mathematical Society, 1980.

[12]

N. Komodakis. Efficient training for pairwise or higher order crfs via dual decomposition. In CVPR, 2011.

Digital Library

[13]

N. Komodakis, N. Paragios, and G. Tziritas. Mrf energy minimization and beyond via dual decomposition. IEEE Trans. Pattern Anal. Mach. Intell., 2011.

Digital Library

[14]

J. Lafferty. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.

Digital Library

[15]

J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, 2006.

Digital Library

[16]

K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In UAI, 1999.

Digital Library

[17]

D. Sontag, A. Globerson, and T. Jaakkola. Introduction to dual decomposition for inference. Optimization for Machine Learning, 1, 2011.

[18]

J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD'09, pages 807--816, 2009.

Digital Library

[19]

J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD, pages 990--998, 2008.

Digital Library

[20]

B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.

Digital Library

[21]

M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2), 2008.

Digital Library

[22]

X. Wang, R. Garnett, and J. Schneider. Active search on graphs. In KDD, 2013.

Digital Library

[23]

Z. Wang and J. Ye. Querying discriminative and representative samples for batch mode active learning. In KDD, 2013.

Digital Library

[24]

E. P. Xing, M. I. Jordan, and S. Russell. A generalized mean field algorithm for variational inference in exponential families. In UAI, 2003.

Digital Library

[25]

Z. Yang, J. Tang, B. Xu, and C. Xing. Active learning for networked data based on non-progressive diffusion model. In WSDM, 2014.

Digital Library

[26]

J. Zhang, B. Liu, J. Tang, T. Chen, and J. Li. Social influence locality for modeling retweeting behaviors. In IJCAI, 2013.

Digital Library

[27]

X. Zhu, J. Lafferty, and Z. Ghahramani. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.

Digital Library

[28]

X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from data streams. In ICDM, 2007.

Digital Library

[29]

I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In ECML/PKDD (3), 2011.

Cited By

Luo S(2022)Research on Active Sampling with Self-supervised ModelBig Data and Security10.1007/978-981-19-0852-1_54(683-695)Online publication date: 10-Mar-2022
https://doi.org/10.1007/978-981-19-0852-1_54
Chen XKang BLijffijt JDe Bie T(2021)ALPINE: Active Link Prediction Using Network EmbeddingApplied Sciences10.3390/app1111504311:11(5043)Online publication date: 29-May-2021
https://doi.org/10.3390/app11115043
Barata RLeite MPacheco RSampaio MAscensão JBizarro PCalinescu ASzpruch L(2021)Active learning for imbalanced data under cold startProceedings of the Second ACM International Conference on AI in Finance10.1145/3490354.3494423(1-9)Online publication date: 3-Nov-2021
https://dl.acm.org/doi/10.1145/3490354.3494423
Show More Cited By

Index Terms

Active Learning for Streaming Networked Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Active learning for data streams: a survey
Abstract
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of ...
High density-focused uncertainty sampling for active learning over evolving stream data
BIGMINE'14: Proceedings of the 3rd International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications - Volume 36

Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small ...
An active learning system for mining time-changing data streams

Mining time-changing data streams is of great interest. The fundamental problems are how to effectively identify the significant changes and organize new training data to adjust the outdated model. In this paper, we propose an active learning system to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and Technology of the People's Republic of China
Beijing key lab of networked multimedia
National Natural Science Foundation of China

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luo S(2022)Research on Active Sampling with Self-supervised ModelBig Data and Security10.1007/978-981-19-0852-1_54(683-695)Online publication date: 10-Mar-2022
https://doi.org/10.1007/978-981-19-0852-1_54
Chen XKang BLijffijt JDe Bie T(2021)ALPINE: Active Link Prediction Using Network EmbeddingApplied Sciences10.3390/app1111504311:11(5043)Online publication date: 29-May-2021
https://doi.org/10.3390/app11115043
Barata RLeite MPacheco RSampaio MAscensão JBizarro PCalinescu ASzpruch L(2021)Active learning for imbalanced data under cold startProceedings of the Second ACM International Conference on AI in Finance10.1145/3490354.3494423(1-9)Online publication date: 3-Nov-2021
https://dl.acm.org/doi/10.1145/3490354.3494423
Wu LWang DFeng SSong KZhang YYu G(2021)Which Node Pair and What Status? Asking Expert for Better Network EmbeddingDatabase Systems for Advanced Applications10.1007/978-3-030-73194-6_11(141-157)Online publication date: 11-Apr-2021
https://dl.acm.org/doi/10.1007/978-3-030-73194-6_11
Hao ZLu CHuang ZWang HHu ZLiu QChen ELee CGupta RLiu YShah MRajan STang JPrakash B(2020)ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property PredictionProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403117(731-752)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403117
Ryther CSimonsen J(2018)Within-Network Classification in Temporal Graphs2018 IEEE International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2018.00041(229-236)Online publication date: Nov-2018
https://doi.org/10.1109/ICDMW.2018.00041
Yao YHolder LSubrahmanian VRokne JKumar RCaverlee JTong H(2016)Classification in dynamic streaming networksProceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.5555/3192424.3192449(138-145)Online publication date: 18-Aug-2016
https://dl.acm.org/doi/10.5555/3192424.3192449
Yao YHolder L(2016)Classification in dynamic streaming networks2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)10.1109/ASONAM.2016.7752225(138-145)Online publication date: Aug-2016
https://doi.org/10.1109/ASONAM.2016.7752225
Weigl EHeidl WLughofer ERadauer TEitzinger C(2016)On improving performance of surface inspection systems by online active learning and flexible classifier updatesMachine Vision and Applications10.1007/s00138-015-0731-927:1(103-127)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1007/s00138-015-0731-9
Yang PZhao PBailey JMoffat AAggarwal Cde Rijke MKumar RMurdock VSellis TYu J(2015)A Min-Max Optimization Framework For Online Graph ClassificationProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806548(643-652)Online publication date: 17-Oct-2015
https://dl.acm.org/doi/10.1145/2806416.2806548
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten