Abstract
Consider a collection of entities, where each may have some demographic properties, and where the entities may be linked in some kind of, perhaps social, network structure. Some of these entities are “of interest”—we call them active. What is the relative likelihood of each of the other entities being active? AFDL, Activity from Demographics and Links, is an algorithm designed to answer this question in a computationally-efficient manner. AFDL is able to work with demographic data, link data (including noisy links), or both; and it is able to process very large datasets quickly. This paper describes AFDL’s feature extraction and classification algorithms, gives timing and accuracy results obtained for several datasets, and offers suggestions for its use in real-world situations.











Similar content being viewed by others
Notes
AFDL and NetKit have been run on an AMD Opteron 242 dual CPU, 1,600 MHz, 8 GB RAM machine under CentOS 4 ×86_64, except for NetKit IMDB runs which were executed on a faster machine with more memory: AMD Opteron 844 quad CPU, 1,800 MHz, 32 with GB of RAM. We obtained NetKit from http://www.research.rutgers.edu/~sofmac/NetKit.html and ran it without modifications using default parameter settings for this setup: local classifier = null, relational classifier = wvRN [9], collective inference = relaxation labeling [14].
References
Getoor L, Diehl CP (2005) Link mining: a survey, SIGKDD explorations. 7(2):3–12
Domingos P (2003) Prospects and challenges for multi-relational data mining, SIGKDD explorations. 5(1):80–83
Fawcett T, Provost F (2003) Adaptive fraud detection. Data Min Knowl Disc 3:291–316
Cortes C, Pregibon D, Volinsky C (2004) Communities of interest. In: Proceedings of intelligent data analysis (IDA)
Neville J, Simsek O, Jensen D, Komoroske J, Palmer K, Goldberg H (2005) Using relational knowledge discovery to prevent securities fraud. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD-05)
Kubica J, Moore A, Cohn D, Schneider J (2003) A fast graph-based method for link analysis and queries. In: Proceedings of the 2003 IJCAI text-mining & link-analysis workshop
Kubica J, Moore A, Schneider J, Yang Y (2002) Stochastic link and group detection, eighteenth national conference on artificial intelligence
Sofus A (2006) Macskassy and foster provost. A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring. Workshop on statistical network learning at 23rd international conference on machine learning ICML 2006, Pittsburgh, PA, USA, June 2006
Sofus A (2003) Macskassy and foster provost. A simple relational classifier. In: Proceedings of the multi-relational data mining workshop (MRDM) at the ninth ACM SIGKDD international conference on knowledge discovery and data mining
Sofus A (2005) Macskassy and foster provost. Suspicion scoring based on guilt-by-association, collective inference, and focused data access. International conference on intelligence analysis
Macskassy SA, Provost F (2006) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res (forthcoming)
Komarek P (2004) Logistic regression for data mining and high-dimensional classification, Ph.D Thesis, Carnegie Mellon University
Dubrawski A (1997) Stochastic validation for automated tuning of neural network’s hyper-parameters. J Rob Auton Syst 21(1):89–93 Elsevier Science Publishers
Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: ACM SIGMOD international conference on management of data
Box GEP, Draper NR (1987) Empirical model building and response surfaces. Wiley
Moore A, Schneider J (1995) Memory based stochastic optimization. In: Advances in neural information processing systems (NIPS 8)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dubrawski, A.W., Ostlund, J.K., Chen, L. et al. Computationally efficient scoring of activity using demographics and connectivity of entities. Inf Technol Manag 11, 77–89 (2010). https://doi.org/10.1007/s10799-010-0069-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-010-0069-y