Latent linkage semantic kernels for collective classification of link data

Tian, Yonghong; Huang, Tiejun; Gao, Wen

doi:10.1007/s10844-006-2208-9

Latent linkage semantic kernels for collective classification of link data

Published: 21 July 2006

Volume 26, pages 269–301, (2006)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Yonghong Tian^1,2,
Tiejun Huang^1,2 &
Wen Gao^1,2

80 Accesses
4 Citations
Explore all metrics

Abstract

Generally, links among objects demonstrate certain patterns and contain rich semantic clues. These important clues can be used to improve classification accuracy. However, many real-world link data may exhibit more complex regularity. For example, there may be some noisy links that carry no human editorial endorsement about semantic relationships. To effectively capture such regularity, this paper proposes latent linkage semantic kernels (LLSKs) by first introducing the linkage kernels to model the local and global dependency structure of a link graph and then applying the singular value decomposition (SVD) in the kernel-induced space. For the computational efficiency on large datasets, we also develop a block-based algorithm for LLSKs. A kernel-based contextual dependency network (KCDN) model is then presented to exploit the dependencies in a network of objects for collective classification. We provide experimental results demonstrating that the KCDN model, together with LLSKs, demonstrates relatively high robustness on the datasets with the complex link regularity, and the block-based computation method can scale well with varying sizes of the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bernstein, A., Clearwater, S., & Provost, F. (2003). The relational vector-space model and industry classification. In Proc. IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 8–18).
Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. SIGMOD Record, 27(2), 307–318.
Article Google Scholar
Chen, Z., Liu, S. P., Liu, W. Y., Pu, G. G., & Ma,W. Y. (2003). Building a web thesaurus from web link structure. In Proc. 26th Annual Int’l ACM SIGIR Conf. Research and Development in Information Retrieval(SIGIR’03) (pp. 48–55). Toronto, Canada: ACM Press, New York, NY, USA.
Craven, M., DiPasquo, D., Freitag, D. McCallum, A., Mitechell, T., Nigam, K., & Slattery, S. (1998). Learning to extract symbolic knowledge from the world wide web. In Proc. 15th National Conf. Artificial Intelligence (AAAI-98) (pp. 509–516). Madison, US: AAAI Press, Menlo Park, US.
Cristianini, N., Shawe-Talyor, J., & Lodhi, H. (2002). Latent semantic kernels. Journal of Intelligent Information Systems, 18(2/3), 127–152.
Article Google Scholar
Datta, B. N. (1995). Numerical linear algebra and application. Brooks/Cole Publishing Co., Pacific Grove, CA.
Google Scholar
Davison, B. (2000). Recognizing nepotistic links on the web. In Proc. AAAI-2000 Workshop on Artificial Intelligence for Web Search (pp. 23–28). Austin, Texas: AAAI Press, Menlo Park, US.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 389–401.
Article Google Scholar
Drineas, P., Kannan, R., Frieze, A., & Vinay, V. (2004). Clustering large graphs via the Singular Value Decomposition. Machine Learning, 56, 9–33.
Article MATH Google Scholar
Fisher, M. J., & Everson, R. M. (2003). When are links useful? Experiments in text classification. In F. Sebastini (Ed.), Advances in information retrieval. 25th European Conference on IR Research (ECIR 2003) (pp. 41–56). Pisa, Italy: Springer.
Google Scholar
Gärtner, T. (2003). A survey of kernels fro structured data. SIGKDD Explorations Newsletter, 5(1), 49–58.
Google Scholar
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of relational structure. Journal of Machine Learning Research, 1, 679–707.
Google Scholar
Hawking, D., Voorhees, E., Craswell, N., & Bailey, P. (1999). Overview of the TREC-8 web track. In E. M. Voorhees and D. K. Harman (Eds.), Proc. 8th Text REtrieval Conf. (TREC-8), (pp. 131–150).
Heckerman, D., Chickering, D., Meek, C., Rounthwaite, R., & Kadie, C. (2001). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.
Article MATH Google Scholar
Henzinger, M. R. (2001, Jan–Feb). Hyperlink analysis for the web. IEEE Internet Computing, 5(1), 45–50.
Article Google Scholar
Hou, J., & Zhang, Y. (2003). Effectively finding relevant Web pages from linkage information. IEEE Transaction on Knowledge and Data Engineering, 15(4), 940–951.
Article Google Scholar
Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proc. 9th Int’l Conf. Machine Learning (ICML03) (pp. 259–266). Morgan Kaufmann Publishers: San Francisco, USA.
Joachims, T., Cristianini, N., & Shawe-Talyor, J. (2001). Composite kernels for hypertext categorization. In Proc. 8th Int’l Conf. Machine Learning (ICML01) (pp. 250–257). San Francisco, US: Morgan Kaunfmann Publishers.
Google Scholar
Kamvar, S. D., Haveliwala, T. H., Manning, C., & Golub, G. H. (2003). Exploiting the block structure of the Web for computing PageRank. Technical report, Department of Computer Science, Stanford University.
Kandola, J., Shawe-Talyor, J., & Cristianini, N. (2002). Learning semantic similarity. In Int’l Conf. Advances in Information Processing System (NIPS 15). MIT Press: Cambridge, MA, USA.
Kao, H.-Y., Lin, S.-H., Ho, J.-M., & Chen, M.-S. (2004). Mining Web informative structures and contents based on Entropy analysis. IEEE Transaction on Knowledge and Data Engineering, 18(1), 41–55.
Article Google Scholar
Karypis, G., & Kumar, V. (1998). METIS, A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices, Version 4.0, http://www.cs.umn.edu/~karypis/metis.
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 604–632.
Article MATH MathSciNet Google Scholar
Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In C. Sammut & A. Hoffman (Eds), Proc. 19th Int’l Conf. Machine Learning (ICML01) (pp. 315–322). Morgan Kaufmann Publishers: San Francisco, USA.
Lu, Q., & Getoor, L. (2003). Link-based classification. In Fawcett & N. Mishra (Eds.), Proc. 12th Int’l Conf. Machine Learning (ICML03) (pp. 496–503). Washington DC: AAAI Press, Menlo Park, US.
McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval Journal, 3, 127–163.
Article Google Scholar
Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. In Proc. 2nd Multi-Relational Data Mining Workshop, 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (pp. 77–91). Washington, DC, USA.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the web. Technical report, Standford University.
Pierre, J. M. (2001). On the automated classification of Web sites. Linköping Electronic Articles in Computer and Information Science, 6(001), Sweden.
Richardson, M., & Domingos, P. (2004). Markov logic networks. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA. http://www.cs.washington.edu/homes/pedrod/mln.pdf.
Schölkopf, B. (2000). Statistical learning and kernel methods. Technical Report, MSR-TR-2000-23, Microsoft Research.
Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Article Google Scholar
Simon, H. D., & Zha, H. (1999). Low-rank matrix approximation using the Lanczos bidiagonalization process with applications. SIAM Journal on Scientific Computing, 21(6), 2257–2274.
Article MathSciNet Google Scholar
Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In Proc. 17th Int’l Joint Conf. Artificial Intelligence(IJCAI01) (pp. 870–876). Seattle, Washington.
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational classification. In Proc. Uncertainty on Artificial Intelligence (UAI-02) (pp. 485–492). Edmonton, Canada.
Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information System, 18(2/3), 219–241.
Article Google Scholar
Zhong, S., & Ghosh, J. (2001). A new formulation of coupled hidden Markov models. Technical report, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, United States.

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, PR China
Yonghong Tian, Tiejun Huang & Wen Gao
Digital Media Institute, Peking University, Beijing, 100080, PR China
Yonghong Tian, Tiejun Huang & Wen Gao

Authors

Yonghong Tian
View author publications
You can also search for this author inPubMed Google Scholar
Tiejun Huang
View author publications
You can also search for this author inPubMed Google Scholar
Wen Gao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yonghong Tian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tian, Y., Huang, T. & Gao, W. Latent linkage semantic kernels for collective classification of link data. J Intell Inf Syst 26, 269–301 (2006). https://doi.org/10.1007/s10844-006-2208-9

Download citation

Received: 23 November 2004
Revised: 09 May 2005
Accepted: 12 May 2005
Published: 21 July 2006
Issue Date: May 2006
DOI: https://doi.org/10.1007/s10844-006-2208-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent linkage semantic kernels for collective classification of link data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Joint Neighborhood Subgraphs Link Prediction

Link Prediction via Higher-Order Motif Features

Link predication based on matrix factorization by fusion of multi class organizations of the network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Latent linkage semantic kernels for collective classification of link data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Joint Neighborhood Subgraphs Link Prediction

Link Prediction via Higher-Order Motif Features

Link predication based on matrix factorization by fusion of multi class organizations of the network

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now