Skip to main content

Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs

  • Chapter
  • 949 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 207))

Abstract

One prominent task in graphs is property prediction, where a property of some of the graph’s nodes is known and used to make predictions for those individuals for which this property is unknown. In this paper, we look at topic prediction for papers organized in a so-called co-authorship graph (CAG) where the individuals are scientific papers with links between them if they share some author. A CAG tends to have a large number of cliques, each formed by all the papers published by the same author. Thus, topic prediction in a CAG tends to be computationally expensive. We investigate in how far we can reduce this complexity without sacrificing the prediction quality by reducing the number of links in the CAG based on the papers’ publication dates. We apply an inexpensive iterative neighbourhood’s majority vote based algorithm to predict unknown topics based on the papers with known topics and the CAG’s link structure. For three data sets, we evaluate our algorithm in terms of classification accuracy and computational time on both the full graph G and subgraphs of it. On substantially smaller subgraphs of G, our algorithm obtains classification accuracies that are similar to the results obtained on G, while achieving a reduction in execution time of up to one order of magnitude.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barabasi, A., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Physica A 311(3-4), 590–614 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Chakrabarti, S., Dom, B.E., Indyk, P.: Enhanced hupertext categorization using hyperlinks. In: Proc. of SIGMOD 1998, ACM Int. Conf. on Management of Data, pp. 307–318 (1998)

    Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    Google Scholar 

  4. Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letters 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  5. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. of American Statistical Association 32, 675–701 (1937)

    Article  Google Scholar 

  6. ILPNet2, http://www.cs.bris.ac.uk/~ilpnet2/tools/reports

  7. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  8. Lu, Q., Getoor, L.: Link based classification. In: Proceedings of the International Conference on Machine Learning, pp. 496–503 (2003)

    Google Scholar 

  9. Macskassy, S.A., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning 8, 935–983 (2007)

    Google Scholar 

  10. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Automating the construction of internet portals with machine learning. Information Retrieval 3(2), 127–163 (2000)

    Article  Google Scholar 

  11. Nemenyi, P.B.: Distribution-free multiple comparisons. PhD thesis, Princeton University (1963)

    Google Scholar 

  12. Neville, J., Jensen, D.: Iterative classification in relational data. In: Proc. AAAI 2000 Workshop on Learning Statistical Models from Relational Data, pp. 13–20 (2000)

    Google Scholar 

  13. Newman, M.: The structure of scientific collaboration networks. In: Proceedings of the National Academy of Sciences, vol. 98, pp. 404–409 (2001)

    Google Scholar 

  14. Newman, M.: Coauthorship networks and patterns of scientific collaboration. In: Proceedings of the National Academy of Sciences 101, pp. 5200–5205 (2004)

    Google Scholar 

  15. Oh, H.-J., Myaeng, S.H., Lee, M.-H.: A practical hypertext categorization method using links and incrementally available class information. In: Proc. of SIGIR 2000 (2000)

    Google Scholar 

  16. Rosenfeld, A., Hummel, R., Zucker, S.: Scene labeling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6, 420–433 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  17. Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proc. of UAI 2002, pp. 485–492 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hoche, S., Hardcastle, D., Flach, P. (2009). Using Time Dependent Link Reduction to Improve the Efficiency of Topic Prediction in Co-Authorship Graphs. In: Fortunato, S., Mangioni, G., Menezes, R., Nicosia, V. (eds) Complex Networks. Studies in Computational Intelligence, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01206-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01206-8_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01205-1

  • Online ISBN: 978-3-642-01206-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics