Skip to main content
Log in

Latent linkage semantic kernels for collective classification of link data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Generally, links among objects demonstrate certain patterns and contain rich semantic clues. These important clues can be used to improve classification accuracy. However, many real-world link data may exhibit more complex regularity. For example, there may be some noisy links that carry no human editorial endorsement about semantic relationships. To effectively capture such regularity, this paper proposes latent linkage semantic kernels (LLSKs) by first introducing the linkage kernels to model the local and global dependency structure of a link graph and then applying the singular value decomposition (SVD) in the kernel-induced space. For the computational efficiency on large datasets, we also develop a block-based algorithm for LLSKs. A kernel-based contextual dependency network (KCDN) model is then presented to exploit the dependencies in a network of objects for collective classification. We provide experimental results demonstrating that the KCDN model, together with LLSKs, demonstrates relatively high robustness on the datasets with the complex link regularity, and the block-based computation method can scale well with varying sizes of the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bernstein, A., Clearwater, S., & Provost, F. (2003). The relational vector-space model and industry classification. In Proc. IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 8–18).

  • Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. SIGMOD Record, 27(2), 307–318.

    Article  Google Scholar 

  • Chen, Z., Liu, S. P., Liu, W. Y., Pu, G. G., & Ma,W. Y. (2003). Building a web thesaurus from web link structure. In Proc. 26th Annual Int’l ACM SIGIR Conf. Research and Development in Information Retrieval(SIGIR’03) (pp. 48–55). Toronto, Canada: ACM Press, New York, NY, USA.

  • Craven, M., DiPasquo, D., Freitag, D. McCallum, A., Mitechell, T., Nigam, K., & Slattery, S. (1998). Learning to extract symbolic knowledge from the world wide web. In Proc. 15th National Conf. Artificial Intelligence (AAAI-98) (pp. 509–516). Madison, US: AAAI Press, Menlo Park, US.

  • Cristianini, N., Shawe-Talyor, J., & Lodhi, H. (2002). Latent semantic kernels. Journal of Intelligent Information Systems, 18(2/3), 127–152.

    Article  Google Scholar 

  • Datta, B. N. (1995). Numerical linear algebra and application. Brooks/Cole Publishing Co., Pacific Grove, CA.

    Google Scholar 

  • Davison, B. (2000). Recognizing nepotistic links on the web. In Proc. AAAI-2000 Workshop on Artificial Intelligence for Web Search (pp. 23–28). Austin, Texas: AAAI Press, Menlo Park, US.

  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 389–401.

    Article  Google Scholar 

  • Drineas, P., Kannan, R., Frieze, A., & Vinay, V. (2004). Clustering large graphs via the Singular Value Decomposition. Machine Learning, 56, 9–33.

    Article  MATH  Google Scholar 

  • Fisher, M. J., & Everson, R. M. (2003). When are links useful? Experiments in text classification. In F. Sebastini (Ed.), Advances in information retrieval. 25th European Conference on IR Research (ECIR 2003) (pp. 41–56). Pisa, Italy: Springer.

    Google Scholar 

  • Gärtner, T. (2003). A survey of kernels fro structured data. SIGKDD Explorations Newsletter, 5(1), 49–58.

    Google Scholar 

  • Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of relational structure. Journal of Machine Learning Research, 1, 679–707.

    Google Scholar 

  • Hawking, D., Voorhees, E., Craswell, N., & Bailey, P. (1999). Overview of the TREC-8 web track. In E. M. Voorhees and D. K. Harman (Eds.), Proc. 8th Text REtrieval Conf. (TREC-8), (pp. 131–150).

  • Heckerman, D., Chickering, D., Meek, C., Rounthwaite, R., & Kadie, C. (2001). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.

    Article  MATH  Google Scholar 

  • Henzinger, M. R. (2001, Jan–Feb). Hyperlink analysis for the web. IEEE Internet Computing, 5(1), 45–50.

    Article  Google Scholar 

  • Hou, J., & Zhang, Y. (2003). Effectively finding relevant Web pages from linkage information. IEEE Transaction on Knowledge and Data Engineering, 15(4), 940–951.

    Article  Google Scholar 

  • Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proc. 9th Int’l Conf. Machine Learning (ICML03) (pp. 259–266). Morgan Kaufmann Publishers: San Francisco, USA.

  • Joachims, T., Cristianini, N., & Shawe-Talyor, J. (2001). Composite kernels for hypertext categorization. In Proc. 8th Int’l Conf. Machine Learning (ICML01) (pp. 250–257). San Francisco, US: Morgan Kaunfmann Publishers.

    Google Scholar 

  • Kamvar, S. D., Haveliwala, T. H., Manning, C., & Golub, G. H. (2003). Exploiting the block structure of the Web for computing PageRank. Technical report, Department of Computer Science, Stanford University.

  • Kandola, J., Shawe-Talyor, J., & Cristianini, N. (2002). Learning semantic similarity. In Int’l Conf. Advances in Information Processing System (NIPS 15). MIT Press: Cambridge, MA, USA.

  • Kao, H.-Y., Lin, S.-H., Ho, J.-M., & Chen, M.-S. (2004). Mining Web informative structures and contents based on Entropy analysis. IEEE Transaction on Knowledge and Data Engineering, 18(1), 41–55.

    Article  Google Scholar 

  • Karypis, G., & Kumar, V. (1998). METIS, A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices, Version 4.0, http://www.cs.umn.edu/~karypis/metis.

  • Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 604–632.

    Article  MATH  MathSciNet  Google Scholar 

  • Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In C. Sammut & A. Hoffman (Eds), Proc. 19th Int’l Conf. Machine Learning (ICML01) (pp. 315–322). Morgan Kaufmann Publishers: San Francisco, USA.

  • Lu, Q., & Getoor, L. (2003). Link-based classification. In Fawcett & N. Mishra (Eds.), Proc. 12th Int’l Conf. Machine Learning (ICML03) (pp. 496–503). Washington DC: AAAI Press, Menlo Park, US.

  • McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval Journal, 3, 127–163.

    Article  Google Scholar 

  • Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. In Proc. 2nd Multi-Relational Data Mining Workshop, 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (pp. 77–91). Washington, DC, USA.

  • Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the web. Technical report, Standford University.

  • Pierre, J. M. (2001). On the automated classification of Web sites. Linköping Electronic Articles in Computer and Information Science, 6(001), Sweden.

  • Richardson, M., & Domingos, P. (2004). Markov logic networks. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA. http://www.cs.washington.edu/homes/pedrod/mln.pdf.

  • Schölkopf, B. (2000). Statistical learning and kernel methods. Technical Report, MSR-TR-2000-23, Microsoft Research.

  • Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.

    Article  Google Scholar 

  • Simon, H. D., & Zha, H. (1999). Low-rank matrix approximation using the Lanczos bidiagonalization process with applications. SIAM Journal on Scientific Computing, 21(6), 2257–2274.

    Article  MathSciNet  Google Scholar 

  • Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In Proc. 17th Int’l Joint Conf. Artificial Intelligence(IJCAI01) (pp. 870–876). Seattle, Washington.

  • Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational classification. In Proc. Uncertainty on Artificial Intelligence (UAI-02) (pp. 485–492). Edmonton, Canada.

  • Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information System, 18(2/3), 219–241.

    Article  Google Scholar 

  • Zhong, S., & Ghosh, J. (2001). A new formulation of coupled hidden Markov models. Technical report, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, United States.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yonghong Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Y., Huang, T. & Gao, W. Latent linkage semantic kernels for collective classification of link data. J Intell Inf Syst 26, 269–301 (2006). https://doi.org/10.1007/s10844-006-2208-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-2208-9

Keywords