Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs

Hoche, Susanne; Hardcastle, David; Flach, Peter

doi:10.1007/978-3-642-01206-8_15

Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs

Susanne Hoche⁶,
David Hardcastle⁶ &
Peter Flach⁶

Chapter

949 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 207))

Abstract

One prominent task in graphs is property prediction, where a property of some of the graph’s nodes is known and used to make predictions for those individuals for which this property is unknown. In this paper, we look at topic prediction for papers organized in a so-called co-authorship graph (CAG) where the individuals are scientific papers with links between them if they share some author. A CAG tends to have a large number of cliques, each formed by all the papers published by the same author. Thus, topic prediction in a CAG tends to be computationally expensive. We investigate in how far we can reduce this complexity without sacrificing the prediction quality by reducing the number of links in the CAG based on the papers’ publication dates. We apply an inexpensive iterative neighbourhood’s majority vote based algorithm to predict unknown topics based on the papers with known topics and the CAG’s link structure. For three data sets, we evaluate our algorithm in terms of classification accuracy and computational time on both the full graph G and subgraphs of it. On substantially smaller subgraphs of G, our algorithm obtains classification accuracies that are similar to the results obtained on G, while achieving a reduction in execution time of up to one order of magnitude.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barabasi, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Physica A 311(3-4), 590–614 (2002)
Article MATH MathSciNet Google Scholar
Chakrabarti, S., Dom, B.E., Indyk, P.: Enhanced hupertext categorization using hyperlinks. In: Proc. of SIGMOD 1998, ACM Int. Conf. on Management of Data, pp. 307–318 (1998)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Google Scholar
Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of American Statistical Association 32, 675–701 (1937)
Article Google Scholar
ILPNet2, http://www.cs.bris.ac.uk/~ilpnet2/tools/reports
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289 (2001)
Google Scholar
Lu, Q., Getoor, L.: Link based classification. In: Proceedings of the International Conference on Machine Learning, pp. 496–503 (2003)
Google Scholar
Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning 8, 935–983 (2007)
Google Scholar
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Information Retrieval 3(2), 127–163 (2000)
Article Google Scholar
Nemenyi, P.B.: Distribution-free multiple comparisons. PhD thesis, Princeton University (1963)
Google Scholar
Neville, J., Jensen, D.: Iterative classification in relational data. In: Proc. AAAI 2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000)
Google Scholar
Newman, M.: The structure of scientific collaboration networks. In: Proceedings of the National Academy of Sciences, vol. 98, pp. 404–409 (2001)
Google Scholar
Newman, M.: Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences 101, pp. 5200–5205 (2004)
Google Scholar
Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext categorization method using links and incrementally available class information. In: Proc. of SIGIR 2000 (2000)
Google Scholar
Rosenfeld, A., Hummel, R., Zucker, S.: Scene labeling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6, 420–433 (1976)
Article MATH MathSciNet Google Scholar
Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proc. of UAI 2002, pp. 485–492 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bristol, UK
Susanne Hoche, David Hardcastle & Peter Flach

Authors

Susanne Hoche
View author publications
You can also search for this author in PubMed Google Scholar
David Hardcastle
View author publications
You can also search for this author in PubMed Google Scholar
Peter Flach
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Complex Networks Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino, Italy
Santo Fortunato
Dipartimento di Ingegneria, Informatica e delle Telecomunicazioni, Università degli Studi di Catania, Viale Andrea Doria, 6, I95125, Catania, Italy
Giuseppe Mangioni
Department of Computer Sciences, Florida Institute of Technology, 150W.University Blvd, FL 32901, Melbourne, USA
Ronaldo Menezes
Laboratorio sui Sistemi Complessi, Scuola Superiore di Catania, Via S.Nullo 5/i, 95123, Catania, Italy
Vincenzo Nicosia

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hoche, S., Hardcastle, D., Flach, P. (2009). Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs. In: Fortunato, S., Mangioni, G., Menezes, R., Nicosia, V. (eds) Complex Networks. Studies in Computational Intelligence, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01206-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-01206-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01205-1
Online ISBN: 978-3-642-01206-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics