Abstract
This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
Article PDF
Similar content being viewed by others
References
Apte C, Damerau F and Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 3(12):233–251.
Bridle J (1989) Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman-Soulie F. and Hérault J. Eds., Neuro-computing: Algorithms, Architectures, and Applications. Springer-Verlag, New York.
Breiman L, Friedman JH, Olshen RA and Stone CJ (1984) Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Buckley C and Salton G (1995) Optimization of relevance feedback weights. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 351–357.
Caropreso MF, Matwin S and Sebastiani F (2001) A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization. In: Amita G. Chin Ed., Text Databases and Document Management: Theory and Practice. Idea Group Publishing, 2001, pp. 78–102.
Cohen W and Singer Y (1996) Context-sensitive learning methods for text categorization. In: Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 307–315.
d'Alché-Buc F, Zwierski D and Nadal JP (1994) Trio-Learning: A tool for building hybrid neural trees. International Journal of Neural Systems, 5(4): December 1994, pp. 259–274.
Galavotti L, Sebastiani F and Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, pp. 59–68.
Hersh W, Buckley C, Leone TJ and Hickman D (1994) OHSUMED: An interactive retrieval evaluation and new large test collection for research. In: Bruce W. Croft and Van Rijsbergen CJ. Eds., Proceedings of the 17th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 1994, pp. 192–201.
Joachims T (1997) Text categorization with support vector machines: Learning with many relevant features.Technical Report LS-8 Report 23, University of Dortmund, 1997.
Joachims T (1999) Estimating the generalization performance of a SVM efficiently. Technical report LS-8 Report25. Universität Dortmund, Dortmund, Dec. 1999.
John G, Kohavi R and Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufman Publishers, San Francisco, CA, pp. 121–129.
Jordan MI and Jacobs RA (1993) Hierarchical mixtures of experts and the EM algorithm. Technical report A.I.Memo No. 1440, Massachusetts Institute of Technology.
Koller D and Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML-97: Proceedings of the Fourteenth International Conference on Machine Learning, San Francisco, CA, pp. 170–178.
Lam W and Ho CY (1998) Using a generalized instance set for automatic text categorization. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 81–89.
Lam W, Ruiz ME and Srinivasan P (1999) Automatic text categorization and its application to text retrieval. IEEE Transactions on Knowledge and Data Engineering, 11(6): pp. 865–879.
Lewis DD (1992) An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th International ACMSIGIR Conference on Research and Development in Information Retrieval, June 1992, pp. 37–50.
Lewis D and Ringuette M (1994) Comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94).
Lewis DD, Schapire RE, Callan JP and Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th International ACMSIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, July 1996, pp. 298–303.
Liu H and Motoda H (1998) Less is more. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Boston, MA, 1998, Ch. 1, pp. 3–12.
McCallum A and Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05. AAAI, AAAI Press, San Francisco, CA, July 1998, pp. 41–48.
McCallum A, Rosenfeld R, Mitchell T and Ng AY (1998) Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning. AAAI, Morgan Kaufmann, July 1998.
McCullagh P and Nelder JA (1989) Generalized Linear Models. Chapman and Hall, London.
Mladenić D (1998) Machine learning on non-homogeneous, distributed text data. PhD Dissertation, University of Ljubljana, Faculty of Computer an Information Science, Ljubljana, Slovenia.
Moulinier I and Ganascia JG (1996) Applying an existing machine learning algorithm to text categorization. In: S Wermer, E Riloff and G Scheler Eds., Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, Springer Verlag, Heidelberg, Germany, pp. 343–354.
National Library of Medicine (1999) Unified Medical Language System (UMLS) Knowledge Sources. 10th edition, U.S. Department of Health and Human Services, National Institute of Health, National Library of Medicine, Jan. 1999.
Ng HT, Goh WB and Low KL (1997) Feature selection, perceptron learning, and a usability case study for text categorization. In: Belkin Nicholas, Desai Narasimhalu Aand Willett Peter Eds., Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997, pp. 67–73.
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Gerald Salton Ed., The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, New Jersey.
Rumelhart DE, Durbin R, Golden R and Chauvin Y, Backpropagation: The basic theory. In: Smolensky P. Mozer MC and Rumelhart DE Eds., Mathematical Perspectives on Neural Networks. Lawrence Earlbaum Associates, Hillsdale, NJ, pp. 533-566.
Sahami M (1998) Using Machine Learning to Improve Information Access. PhD Thesis, Stanford University, Computer Science Department.
Schapire RE, Singer Y and Singhal A (1998) Boosting and Rocchio applied to text filtering. In: Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, Aug. 1998. pp. 215–223.
Schütze H, Hull DA and Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, July 1995, pp. 229–237.
Singhal A, Buckley C and Mitra M (1996) Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, Aug. 1996. pp. 21–29.
Singhal A, Mitra M and Buckley C (1997) Learning routing queries in a query zone. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, July 1997. pp. 25–32.
van Rijsbergen CJ (1979) Information Retrieval, 2nd ed. Butterworths, London.
Waterhouse SR (1997) Classification and regression using mixtures of experts. PhD. Dissertation, University of Cambridge, Cambridge England.
Waterhouse SR and Robinson AJ (1994) Classification using hierarchical mixture of experts. In: Proceedings of the 1994 IEEE Workshop on Neural Networks for Signal Processing IV, pp. 177–186.
Weigend AS, Wiener EDand Pedersen JO (1999) Exploiting hierarchy in text categorization. Information Retrieval, 1(3):193–216.
Wiener E, Pedersen JO and Weigend AS (1995) A neural network approach to topic spotting. In: Proceedings of SDAIR'95, pp. 317–332.
Wolpert DH (1993) Stacked generalization. Tech. Rep. LA-UR-90-3460, The Santa Fe Institute, Santa Fe, NM.
Yang J and Honovar V (1998) Feature subset selection using a genetic algorithm. In: Liu Huan and Motoda Hiroshi Eds., Feature Extraction, Construction and Selection:AData Mining Perspective. Kluwer Academic Publishers, Boston, MA, Ch. 8, pp. 117–136.
Yang Y (1996) An evaluation of statistical approaches to MEDLINE indexing. In: Proceedings of the American Medical Informatic Association (AMIA), pp. 358–362.
Yang Y (1999) An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1):69–90.
Yang Y and Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97). Morgan Kaufmann Publishers, San Francisco, CA, July 1997.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ruiz, M.E., Srinivasan, P. Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5, 87–118 (2002). https://doi.org/10.1023/A:1012782908347
Issue Date:
DOI: https://doi.org/10.1023/A:1012782908347