Skip to main content

An Evolutionary Approach to Automatic Web Page Categorization and Updating

  • Conference paper
  • First Online:
Web Intelligence: Research and Development (WI 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2198))

Included in the following conference series:

Abstract

Catalogues play an important role in most of the current Web search engines. The catalogues, which organize documents into hierarchical collections, are maintained manually increasing difficult y and costs due to the incessant growing of the WWW. This problem has stimulated many researches to work on automatic categorization of Web documents. In reality, most of these approaches work well either on special types of documents or on restricted set of documents. This paper presents an evolutionary approach useful to construct automatically the catalogue as well as to perform the classification of a Web document. This functionality relies on a genetic-based fuzzy clustering methodology that applies the clustering on the context of the document, as opposite to content based clustering that works on the complete document information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Attardi, G., Di Marco S., and Salvi, D. (1998). Categorisation by Context. Journal of Universal Compouter Science, 4:719–736.

    Google Scholar 

  2. Boley, D., Gini., M., Gross, R., Hang, E-H., Hasting, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. (1999). Partioning-based clustering for Web document categorization Decision Support System, 27 (1999) 329–341.

    Article  Google Scholar 

  3. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Rahavan, P., and Rajagopalan, S. (1998). Automatic resource list compilation by analyzingh yperlink structure and associated text. Seventh International World Wide Web Conference, 1998.

    Google Scholar 

  4. Chang, C-H., and Hsu, C-C. (1997). Customizable Multi-Engine Search tool with Clustering. Sixth International World Wide Web Conference, April 7-11, 1997 Santa Clara, California, USA.

    Google Scholar 

  5. Cohen, W. (1998). A web-based information system that reasons with structured collections of text. Agents’98, 1998.

    Google Scholar 

  6. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. (1998). Learningto extract symbolic knowledge from the World Wide Web. AAAI-98, 1998.

    Google Scholar 

  7. Hayes, J., and Weinstein, S. P. (1990). CONSTRUE-TIS: A system for contentbased indexingof a database of news stories. Second Annual Conference on Innovative Applications of Artificial Intelligence, 1–5.

    Google Scholar 

  8. Iwayama, M. (1995). Cluster-based text categorization: a comparison of category search strategies. SIGIR-95, pp. 273–280.

    Google Scholar 

  9. Open Directory Project. URL: http://www.dmoz.org/about.html

  10. Lawrence, S. and Giles, C. L. (1999). Nature, 400:107–109.Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99).

    Google Scholar 

  11. Mase, H., Tsuji, H., Kinukawa, H., Hosoya, Y., Koutani, K., and Kiyota, K. (1996). Experimental simulation for automatic patent categorization. Advances in Production Management Systems, 377–382.

    Google Scholar 

  12. McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). A Machine Learning Approach to BuildingDomain-Sp ecific Search Engine. Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99).

    Google Scholar 

  13. Sahami, M., Yusufali, S., and Baldoando, M. Q., W. (1998) SONIA: A service for organizing networked information autonomously. Third ACM Conference on Digital Libraries.

    Google Scholar 

  14. Selberg, E. (1999) Towards Comprehensive Web Search. PhD thesis, University of Washington.

    Google Scholar 

  15. Selberg, E and Etzioni, O. (2000). On the Instability of Web Search Engine. RIAO 2000.

    Google Scholar 

  16. JDK Java 2 Sun. http://www.java.sun.com

  17. Zamir, O., and Etzioni, O. (1988).Web Document Clustering: A Feasibility Demonstration. SIGIR’98, Melbourne, Australia, ACM Press.

    Google Scholar 

  18. A Lexical Database for English. URL: http://www.cogsci.princeton.edu/wn/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Loia, V., Luongo, P. (2001). An Evolutionary Approach to Automatic Web Page Categorization and Updating. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_35

Download citation

  • DOI: https://doi.org/10.1007/3-540-45490-X_35

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42730-8

  • Online ISBN: 978-3-540-45490-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics