Abstract
Hierarchical text classification for a large-scale Web taxonomy is challenging because the number of categories hierarchically organized is large and the training data for deep categories are usually sparse. It’s been shown that a narrow-down approach involving a search of the taxonomical tree is an effective method for the problem. A recent study showed that both local and global information for a node is useful for further improvement. This paper introduces two methods for mixing local and global models dynamically for individual nodes and shows they improve classification effectiveness by 5% and 30%, respectively, over and above the state-of-art method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: 30th ACM SIGIR, pp. 231–238 (2007)
Broder, A., Fontoura, M., Josifovski, V., Riedel, L.: A semantic approach to contextual advertising. In: 30th ACM SIGIR, pp. 559–566 (2007)
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: 28th ACM SIGIR, pp. 504–511 (2005)
Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ECIR 2010 large scale hierarchical classification workshop. SIGIR Forum. 44, 23–32 (2010)
Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: IEEE ICDM, pp. 521–528 (2001)
Liu, T.Y., Yang, Y., Wan, H., Zeng, H.J., Chen, Z., Ma, W.Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter 7, 36–43 (2005)
Bennett, P.N., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: 32nd ACM SIGIR, pp. 11–18 (2009)
Xue, G.R., Xing, D., Yang, Q., Yu, Y.: Deep classification in large-scale text hierarchies. In: 31st ACM SIGIR, pp. 619–626 (2008)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: 15th ICML, pp. 359–367 (1998)
Cai, L., Hofmann, T.: Hierarchical document categorization with support vector machines. In: 13th ACM CIKM, pp. 78–87 (2004)
Labrou, Y., Finin, T.: Yahoo! as an ontology: using Yahoo! categories to describe documents. In: 8th ACM CIKM, pp. 180–187 (1999)
Sasaki, M., Kita, K.: Rule-based text categorization using hierarchical categories. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, pp. 2827–2830 (1998)
Wang, K., Zhou, S., He, Y.: Hierarchical classification of real life documents. In: 1st (SIAM) International Conference on Data Mining, pp. 1–16 (2001)
Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: 26th ACM SIGIR, pp. 96–103 (2003)
Oh, H.S., Choi, Y., Myaeng, S.H.: Combining global and local information for enhanced deep classification. In: 2010 ACM SAC, pp. 1760–1767 (2010)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: International Conference on Machine Learning, pp. 170–178 (1997)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning, pp. 412–420 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oh, HS., Choi, Y., Myaeng, SH. (2011). Text Classification for a Large-Scale Taxonomy Using Dynamically Mixed Local and Global Models for a Node. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)