Hierarchical Text Categorization Using Neural Networks

Ruiz, Miguel E.; Srinivasan, Padmini

doi:10.1023/A:1012782908347

Hierarchical Text Categorization Using Neural Networks

Published: January 2002

Volume 5, pages 87–118, (2002)
Cite this article

Download PDF

Information Retrieval Aims and scope Submit manuscript

Hierarchical Text Categorization Using Neural Networks

Download PDF

Miguel E. Ruiz¹ &
Padmini Srinivasan¹

779 Accesses
162 Citations
Explore all metrics

Abstract

This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.

References

Apte C, Damerau F and Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 3(12):233–251.
Google Scholar
Bridle J (1989) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman-Soulie F. and Hérault J. Eds., Neuro-computing: Algorithms, Architectures, and Applications. Springer-Verlag, New York.
Google Scholar
Breiman L, Friedman JH, Olshen RA and Stone CJ (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Google Scholar
Buckley C and Salton G (1995) Optimization of relevance feedback weights. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 351–357.
Caropreso MF, Matwin S and Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Amita G. Chin Ed., Text Databases and Document Management: Theory and Practice. Idea Group Publishing, 2001, pp. 78–102.
Cohen W and Singer Y (1996) Context-sensitive learning methods for text categorization. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 307–315.
Google Scholar
d'Alché-Buc F, Zwierski D and Nadal JP (1994) Trio-Learning: A tool for building hybrid neural trees. International Journal of Neural Systems, 5(4): December 1994, pp. 259–274.
Google Scholar
Galavotti L, Sebastiani F and Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, pp. 59–68.
Hersh W, Buckley C, Leone TJ and Hickman D (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Bruce W. Croft and Van Rijsbergen CJ. Eds., Proceedings of the 17th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 1994, pp. 192–201.
Joachims T (1997) Text categorization with support vector machines: Learning with many relevant features.Technical Report LS-8 Report 23, University of Dortmund, 1997.
Joachims T (1999) Estimating the generalization performance of a SVM efficiently. Technical report LS-8 Report25. Universität Dortmund, Dortmund, Dec. 1999.
Google Scholar
John G, Kohavi R and Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufman Publishers, San Francisco, CA, pp. 121–129.
Google Scholar
Jordan MI and Jacobs RA (1993) Hierarchical mixtures of experts and the EM algorithm. Technical report A.I.Memo No. 1440, Massachusetts Institute of Technology.
Koller D and Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML-97: Proceedings of the Fourteenth International Conference on Machine Learning, San Francisco, CA, pp. 170–178.
Lam W and Ho CY (1998) Using a generalized instance set for automatic text categorization. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 81–89.
Lam W, Ruiz ME and Srinivasan P (1999) Automatic text categorization and its application to text retrieval. IEEE Transactions on Knowledge and Data Engineering, 11(6): pp. 865–879.
Google Scholar
Lewis DD (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th International ACMSIGIR Conference on Research and Development in Information Retrieval, June 1992, pp. 37–50.
Lewis D and Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94).
Lewis DD, Schapire RE, Callan JP and Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th International ACMSIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 298–303.
Liu H and Motoda H (1998) Less is more. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Boston, MA, 1998, Ch. 1, pp. 3–12.
Google Scholar
McCallum A and Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05. AAAI, AAAI Press, San Francisco, CA, July 1998, pp. 41–48.
Google Scholar
McCallum A, Rosenfeld R, Mitchell T and Ng AY (1998) Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning. AAAI, Morgan Kaufmann, July 1998.
Google Scholar
McCullagh P and Nelder JA (1989) Generalized Linear Models. Chapman and Hall, London.
Google Scholar
Mladenić D (1998) Machine learning on non-homogeneous, distributed text data. PhD Dissertation, University of Ljubljana, Faculty of Computer an Information Science, Ljubljana, Slovenia.
Google Scholar
Moulinier I and Ganascia JG (1996) Applying an existing machine learning algorithm to text categorization. In: S Wermer, E Riloff and G Scheler Eds., Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Springer Verlag, Heidelberg, Germany, pp. 343–354.
Google Scholar
National Library of Medicine (1999) Unified Medical Language System (UMLS) Knowledge Sources. 10th edition, U.S. Department of Health and Human Services, National Institute of Health, National Library of Medicine, Jan. 1999.
Ng HT, Goh WB and Low KL (1997) Feature selection, perceptron learning, and a usability case study for text categorization. In: Belkin Nicholas, Desai Narasimhalu Aand Willett Peter Eds., Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997, pp. 67–73.
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Gerald Salton Ed., The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, New Jersey.
Google Scholar
Rumelhart DE, Durbin R, Golden R and Chauvin Y, Backpropagation: The basic theory. In: Smolensky P. Mozer MC and Rumelhart DE Eds., Mathematical Perspectives on Neural Networks. Lawrence Earlbaum Associates, Hillsdale, NJ, pp. 533-566.
Sahami M (1998) Using Machine Learning to Improve Information Access. PhD Thesis, Stanford University, Computer Science Department.
Schapire RE, Singer Y and Singhal A (1998) Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, Aug. 1998. pp. 215–223.
Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 229–237.
Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, Aug. 1996. pp. 21–29.
Singhal A, Mitra M and Buckley C (1997) Learning routing queries in a query zone. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997. pp. 25–32.
van Rijsbergen CJ (1979) Information Retrieval, 2nd ed. Butterworths, London.
Google Scholar
Waterhouse SR (1997) Classification and regression using mixtures of experts. PhD. Dissertation, University of Cambridge, Cambridge England.
Google Scholar
Waterhouse SR and Robinson AJ (1994) Classification using hierarchical mixture of experts. In: Proceedings of the 1994 IEEE Workshop on Neural Networks for Signal Processing IV, pp. 177–186.
Weigend AS, Wiener EDand Pedersen JO (1999) Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216.
Google Scholar
Wiener E, Pedersen JO and Weigend AS (1995) A neural network approach to topic spotting. In: Proceedings of SDAIR'95, pp. 317–332.
Wolpert DH (1993) Stacked generalization. Tech. Rep. LA-UR-90-3460, The Santa Fe Institute, Santa Fe, NM.
Google Scholar
Yang J and Honovar V (1998) Feature subset selection using a genetic algorithm. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction, Construction and Selection:AData Mining Perspective. Kluwer Academic Publishers, Boston, MA, Ch. 8, pp. 117–136.
Google Scholar
Yang Y (1996) An evaluation of statistical approaches to MEDLINE indexing. In: Proceedings of the American Medical Informatic Association (AMIA), pp. 358–362.
Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1):69–90.
Google Scholar
Yang Y and Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97). Morgan Kaufmann Publishers, San Francisco, CA, July 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Library and Information Science, The University of Iowa, 3087 Main Library, Iowa City, IA, 52242-1420, USA
Miguel E. Ruiz & Padmini Srinivasan

Authors

Miguel E. Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Padmini Srinivasan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruiz, M.E., Srinivasan, P. Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5, 87–118 (2002). https://doi.org/10.1023/A:1012782908347

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1012782908347

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hierarchical Text Categorization Using Neural Networks

Abstract

Article PDF

Similar content being viewed by others

An approach to text data categorization based on the ideas of J.S. Mill

Categorization of text documents taking into account some structural features

Assessing Intelligence Text Classification Techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Hierarchical Text Categorization Using Neural Networks

Abstract

Article PDF

Similar content being viewed by others

An approach to text data categorization based on the ideas of J.S. Mill

Categorization of text documents taking into account some structural features

Assessing Intelligence Text Classification Techniques

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation