Abstract
One prominent task in graphs is property prediction, where a property of some of the graph’s nodes is known and used to make predictions for those individuals for which this property is unknown. In this paper, we look at topic prediction for papers organized in a so-called co-authorship graph (CAG) where the individuals are scientific papers with links between them if they share some author. A CAG tends to have a large number of cliques, each formed by all the papers published by the same author. Thus, topic prediction in a CAG tends to be computationally expensive. We investigate in how far we can reduce this complexity without sacrificing the prediction quality by reducing the number of links in the CAG based on the papers’ publication dates. We apply an inexpensive iterative neighbourhood’s majority vote based algorithm to predict unknown topics based on the papers with known topics and the CAG’s link structure. For three data sets, we evaluate our algorithm in terms of classification accuracy and computational time on both the full graph G and subgraphs of it. On substantially smaller subgraphs of G, our algorithm obtains classification accuracies that are similar to the results obtained on G, while achieving a reduction in execution time of up to one order of magnitude.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Barabasi, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Physica A 311(3-4), 590–614 (2002)
Chakrabarti, S., Dom, B.E., Indyk, P.: Enhanced hupertext categorization using hyperlinks. In: Proc. of SIGMOD 1998, ACM Int. Conf. on Management of Data, pp. 307–318 (1998)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of American Statistical Association 32, 675–701 (1937)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289 (2001)
Lu, Q., Getoor, L.: Link based classification. In: Proceedings of the International Conference on Machine Learning, pp. 496–503 (2003)
Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning 8, 935–983 (2007)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Information Retrieval 3(2), 127–163 (2000)
Nemenyi, P.B.: Distribution-free multiple comparisons. PhD thesis, Princeton University (1963)
Neville, J., Jensen, D.: Iterative classification in relational data. In: Proc. AAAI 2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000)
Newman, M.: The structure of scientific collaboration networks. In: Proceedings of the National Academy of Sciences, vol. 98, pp. 404–409 (2001)
Newman, M.: Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences 101, pp. 5200–5205 (2004)
Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext categorization method using links and incrementally available class information. In: Proc. of SIGIR 2000 (2000)
Rosenfeld, A., Hummel, R., Zucker, S.: Scene labeling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6, 420–433 (1976)
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proc. of UAI 2002, pp. 485–492 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hoche, S., Hardcastle, D., Flach, P. (2009). Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs. In: Fortunato, S., Mangioni, G., Menezes, R., Nicosia, V. (eds) Complex Networks. Studies in Computational Intelligence, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01206-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-01206-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01205-1
Online ISBN: 978-3-642-01206-8
eBook Packages: EngineeringEngineering (R0)