Skip to main content

A Theoretical Framework for Web Categorization in Hierarchical Directories using Bayesian Networks

  • Chapter
Soft Computing in Web Information Retrieval

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 197))

  • 367 Accesses

Summary

In this paper, we shall present a theoretical framework for classifying web pages in a hierarchical directory using the Bayesian Network formalism. In particular, we shall focus on the problem of multi-label text categorization, where a given document can be assigned to any number of categories in the hierarchy. The idea is to explicitly represent the dependence relationships between the different categories in the hierarchy, although adapted to include the category descriptors. Given a new document (web page) to be classified, a Bayesian Network inference process shall be used to compute the probability of each category given the document. The web page is then assigned to those classes with the highest posterior probability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Brin and L. Page, (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30:1–7, pp. 107–117.

    Article  Google Scholar 

  2. L.M. de Campos, J.M. Fernández-Luna, J.F. Huete, (2003) The BNR model: Foundations and performance of a Bayesian network retrieval model, International Journal of Approximate Reasoning 34:265–285.

    Article  MATH  MathSciNet  Google Scholar 

  3. L.M. de Campos, J.M. Fernández-Luna, J.F. Huete, (2004) Using context information in structured document retrieval: an approach based on influence diagrams, Information Processing and Management 40(5):829–847.

    Article  Google Scholar 

  4. S. Dumais, H. Chen (2000). Hierarchical Classification of Web Content. In Proceedings of the SIGIR Conference 256–263.

    Google Scholar 

  5. I. Frommholz (2001). Categorizing Web Documents in Hierarchical Catalogues. In Proceedings of the 23rd Conference on Information Retrieval

    Google Scholar 

  6. N. Govert, M. Lalmas and N. Furh (1999). A probabilistic description-oriented approach for categorising web documents. In Proc. of the ACM Intern. Conference on Information Knowledge and Management, 475–482.

    Google Scholar 

  7. F.V. Jensen (1996). An Introduction to Bayesian Networks. University College London Press, London.

    Google Scholar 

  8. J. Kleinberg. (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM, 46:5, pp. 604–632.

    Article  MATH  MathSciNet  Google Scholar 

  9. D. Koller, M. Sahami (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning 170–178.

    Google Scholar 

  10. D. Mladenić (1998). Turning Yahoo into an Automatic Web-page Classifier. In Proceedings of the 13th European Conference on Artificial Intelligence 473–474.

    Google Scholar 

  11. J. Pearl (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan and Kaufmann, California.

    Google Scholar 

  12. M.E. Ruiz, P. Srinivasan (2002). Hierarchical Text Categorization Using Neural Networks. Information Retrieval 5:87–118, 2002

    Article  MATH  Google Scholar 

  13. F. Sebastiani (2002). Machine Learning in Automated Text Categorizarion. ACM Computing Surveys 34(1):1–47.

    Article  Google Scholar 

  14. R. Schapier, E. Singer and A. Singhal (1998) Boosting and Rochio applied to text filtering. In Proc. of the SIGIR’98, 21st Intern. Conference on Research and Developement in Information Retrieval. 215–223.

    Google Scholar 

  15. A. Sun, E. Lim (2001). Hierarchical Text Classification and Evaluation. In Proceedings of the IEEE International Conference on Data Mining 521–528.

    Google Scholar 

  16. A. Sun, E. Lim, W. Ng, J. Srivastava (2004). Blocking Reduction Strategies in Hierarchical Text Classification. In IEEE Transactions on Knowledge and Data Engineering, 18(10), 1305–1308.

    Article  Google Scholar 

  17. www.yahoo.com

    Google Scholar 

  18. Y. Yang, J. Zhang and B. Kisiel. (2003). A scalability of classifiers in text categorization. In Proc. SIGIR’03, Intern. Conference on Research and Developement in Information Retrieval. 96–103.

    Google Scholar 

  19. Y. Yang and J. Pedersen (1997). A comparative study on feature selection in text categorization. In Proc. of International Conference on Machine Learning. 412–420.

    Google Scholar 

  20. A.S. Weigend, E.D. Weiener, J.O. Pedersen (1999). Exploiting Hierarchy in Text Categorization. Information Retrieval 1:193–216.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

de Campos, L.M., Fernández-Luna, J.M., Huete, J.F. (2006). A Theoretical Framework for Web Categorization in Hierarchical Directories using Bayesian Networks. In: Herrera-Viedma, E., Pasi, G., Crestani, F. (eds) Soft Computing in Web Information Retrieval. Studies in Fuzziness and Soft Computing, vol 197. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31590-X_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-31590-X_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31588-9

  • Online ISBN: 978-3-540-31590-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics