Abstract
Generally, links among objects demonstrate certain patterns and contain rich semantic clues. These important clues can be used to improve classification accuracy. However, many real-world link data may exhibit more complex regularity. For example, there may be some noisy links that carry no human editorial endorsement about semantic relationships. To effectively capture such regularity, this paper proposes latent linkage semantic kernels (LLSKs) by first introducing the linkage kernels to model the local and global dependency structure of a link graph and then applying the singular value decomposition (SVD) in the kernel-induced space. For the computational efficiency on large datasets, we also develop a block-based algorithm for LLSKs. A kernel-based contextual dependency network (KCDN) model is then presented to exploit the dependencies in a network of objects for collective classification. We provide experimental results demonstrating that the KCDN model, together with LLSKs, demonstrates relatively high robustness on the datasets with the complex link regularity, and the block-based computation method can scale well with varying sizes of the problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bernstein, A., Clearwater, S., & Provost, F. (2003). The relational vector-space model and industry classification. In Proc. IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (pp. 8–18).
Chakrabarti, S., Dom, B., & Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. SIGMOD Record, 27(2), 307–318.
Chen, Z., Liu, S. P., Liu, W. Y., Pu, G. G., & Ma,W. Y. (2003). Building a web thesaurus from web link structure. In Proc. 26th Annual Int’l ACM SIGIR Conf. Research and Development in Information Retrieval(SIGIR’03) (pp. 48–55). Toronto, Canada: ACM Press, New York, NY, USA.
Craven, M., DiPasquo, D., Freitag, D. McCallum, A., Mitechell, T., Nigam, K., & Slattery, S. (1998). Learning to extract symbolic knowledge from the world wide web. In Proc. 15th National Conf. Artificial Intelligence (AAAI-98) (pp. 509–516). Madison, US: AAAI Press, Menlo Park, US.
Cristianini, N., Shawe-Talyor, J., & Lodhi, H. (2002). Latent semantic kernels. Journal of Intelligent Information Systems, 18(2/3), 127–152.
Datta, B. N. (1995). Numerical linear algebra and application. Brooks/Cole Publishing Co., Pacific Grove, CA.
Davison, B. (2000). Recognizing nepotistic links on the web. In Proc. AAAI-2000 Workshop on Artificial Intelligence for Web Search (pp. 23–28). Austin, Texas: AAAI Press, Menlo Park, US.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 389–401.
Drineas, P., Kannan, R., Frieze, A., & Vinay, V. (2004). Clustering large graphs via the Singular Value Decomposition. Machine Learning, 56, 9–33.
Fisher, M. J., & Everson, R. M. (2003). When are links useful? Experiments in text classification. In F. Sebastini (Ed.), Advances in information retrieval. 25th European Conference on IR Research (ECIR 2003) (pp. 41–56). Pisa, Italy: Springer.
Gärtner, T. (2003). A survey of kernels fro structured data. SIGKDD Explorations Newsletter, 5(1), 49–58.
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of relational structure. Journal of Machine Learning Research, 1, 679–707.
Hawking, D., Voorhees, E., Craswell, N., & Bailey, P. (1999). Overview of the TREC-8 web track. In E. M. Voorhees and D. K. Harman (Eds.), Proc. 8th Text REtrieval Conf. (TREC-8), (pp. 131–150).
Heckerman, D., Chickering, D., Meek, C., Rounthwaite, R., & Kadie, C. (2001). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.
Henzinger, M. R. (2001, Jan–Feb). Hyperlink analysis for the web. IEEE Internet Computing, 5(1), 45–50.
Hou, J., & Zhang, Y. (2003). Effectively finding relevant Web pages from linkage information. IEEE Transaction on Knowledge and Data Engineering, 15(4), 940–951.
Jensen, D., & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proc. 9th Int’l Conf. Machine Learning (ICML03) (pp. 259–266). Morgan Kaufmann Publishers: San Francisco, USA.
Joachims, T., Cristianini, N., & Shawe-Talyor, J. (2001). Composite kernels for hypertext categorization. In Proc. 8th Int’l Conf. Machine Learning (ICML01) (pp. 250–257). San Francisco, US: Morgan Kaunfmann Publishers.
Kamvar, S. D., Haveliwala, T. H., Manning, C., & Golub, G. H. (2003). Exploiting the block structure of the Web for computing PageRank. Technical report, Department of Computer Science, Stanford University.
Kandola, J., Shawe-Talyor, J., & Cristianini, N. (2002). Learning semantic similarity. In Int’l Conf. Advances in Information Processing System (NIPS 15). MIT Press: Cambridge, MA, USA.
Kao, H.-Y., Lin, S.-H., Ho, J.-M., & Chen, M.-S. (2004). Mining Web informative structures and contents based on Entropy analysis. IEEE Transaction on Knowledge and Data Engineering, 18(1), 41–55.
Karypis, G., & Kumar, V. (1998). METIS, A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices, Version 4.0, http://www.cs.umn.edu/~karypis/metis.
Kleinberg, J. (1999). Authoritative sources in a hyperlinked environment. Journal of ACM, 46(5), 604–632.
Kondor, R. I., & Lafferty, J. (2002). Diffusion kernels on graphs and other discrete structures. In C. Sammut & A. Hoffman (Eds), Proc. 19th Int’l Conf. Machine Learning (ICML01) (pp. 315–322). Morgan Kaufmann Publishers: San Francisco, USA.
Lu, Q., & Getoor, L. (2003). Link-based classification. In Fawcett & N. Mishra (Eds.), Proc. 12th Int’l Conf. Machine Learning (ICML03) (pp. 496–503). Washington DC: AAAI Press, Menlo Park, US.
McCallum, A., Nigam, K., Rennie, J., & Seymore, K. (2000). Automating the construction of internet portals with machine learning. Information Retrieval Journal, 3, 127–163.
Neville, J., & Jensen, D. (2003). Collective classification with relational dependency networks. In Proc. 2nd Multi-Relational Data Mining Workshop, 9th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (pp. 77–91). Washington, DC, USA.
Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the web. Technical report, Standford University.
Pierre, J. M. (2001). On the automated classification of Web sites. Linköping Electronic Articles in Computer and Information Science, 6(001), Sweden.
Richardson, M., & Domingos, P. (2004). Markov logic networks. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA. http://www.cs.washington.edu/homes/pedrod/mln.pdf.
Schölkopf, B. (2000). Statistical learning and kernel methods. Technical Report, MSR-TR-2000-23, Microsoft Research.
Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Simon, H. D., & Zha, H. (1999). Low-rank matrix approximation using the Lanczos bidiagonalization process with applications. SIAM Journal on Scientific Computing, 21(6), 2257–2274.
Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. In Proc. 17th Int’l Joint Conf. Artificial Intelligence(IJCAI01) (pp. 870–876). Seattle, Washington.
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational classification. In Proc. Uncertainty on Artificial Intelligence (UAI-02) (pp. 485–492). Edmonton, Canada.
Yang, Y., Slattery, S., & Ghani, R. (2002). A study of approaches to hypertext categorization. Journal of Intelligent Information System, 18(2/3), 219–241.
Zhong, S., & Ghosh, J. (2001). A new formulation of coupled hidden Markov models. Technical report, Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, United States.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tian, Y., Huang, T. & Gao, W. Latent linkage semantic kernels for collective classification of link data. J Intell Inf Syst 26, 269–301 (2006). https://doi.org/10.1007/s10844-006-2208-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-2208-9