Abstract
The emergence of Web 2.0 has created a lot of annotation and personalization information about web resources. Extracting and utilizing these information to enhance the quality of services is a key target of modern digital libraries. In this paper, we present a novel Automatic Document Tagging (ADT) approach for digital libraries. In our approach, the ADT problem is formulated as a variant of multi-class classification problem. But differently, the training data for ADT is collected from the user’s historic tags and only partially labeled. The incompleteness of the training data makes the training a more challenging problem. To overcome this problem, an efficient randomized online training algorithm (RPL) is proposed. RPL algorithm has two phases: (i) random exploitation and (ii) classifier update. The experimental results from both synthetic and real-word data demonstrate the effectiveness.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)
Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 3(951) (2003)
Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, Chichester (1973)
Fink, M., Shalev-Shwartz, S., Singer, Y., Ullman, S.: On- line multiclass learning by interclass hypothesis sharing. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Fox, E.: The digital libraries initiative - update and discussion. In: Bulletin of the America Society of Information Science, October/November 1999, vol. 26 (1999)
Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)
Geroimenko, V.: A semantic web primer. Computer Journal 48(1) (2006)
Kahn, R., Cerf, V.: An open architecture for digital library system and a plan for its development. Digital Libary Project 1 (1998)
Kivinen, J., Warmuth, M.: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132 (January 1997)
Kruk, S.R., Decker, S., Zieborak, L.: JeromeDL - adding semantic web technologies to digital libraries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 716–725. Springer, Heidelberg (2005)
Kruk, S., Woroniecki, T., Gzella, A., Dabrowski, M., McDaniel, B.: Anatomy of a social semantic library. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519. Springer, Heidelberg (2007)
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 331–339. Morgan Kaufmann, San Francisco (1995)
Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: NIPS (2007)
Langville, A., Carl, D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)
Mika, P.: Ontologies are us: A unified model of social networks and semantics. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 522–536. Springer, Heidelberg (2005)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)
Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 386–407 (1988)
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (April 1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, X., Niu, Z. (2009). Automatic Document Tagging in Social Semantic Digital Library. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-10684-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10682-8
Online ISBN: 978-3-642-10684-2
eBook Packages: Computer ScienceComputer Science (R0)