Skip to main content

Semi-automatic Creation and Maintenance of Web Resources with webTopic

  • Conference paper
Semantics, Web and Mining (EWMF 2005, KDO 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4289))

Included in the following conference series:

Abstract

In this paper we propose a methodology for automatically retrieving document collections from the web on specific topics and for organizing them and keeping them up-to-date over time, according to user specific persistent information needs. The documents collected are organized according to user specifications and are classified partly by the user and partly automatically. A presentation layer enables the exploration of large sets of documents and, simultaneously, monitors and records user interaction with these document collections. The quality of the system is permanently monitored; the system periodically measures and stores the values of its quality parameters. Using this quality log it is possible to maintain the quality of the resources by triggering procedures aimed at correcting or preventing quality degradation.

Supported by the POSC/EIA/58367/2004/Site-o-Matic Project (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual de Unidades de I & D.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yate, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  2. Bueno, D., David, A.A.: METIORE: A Personalized Information Retrieval System. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS, vol. 2109, p. 168. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Buntine, W., Perttu, S., Tirri, H.: Building and Maintaining Web Taxonomies. In: Proceedings of the XML Finland 2002 Conference, pp. 54–65 (2002)

    Google Scholar 

  4. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference (1998)

    Google Scholar 

  5. Chakrabarti, S., Berg, M., Dom, B.: Focused crawling: a new approach to topic specific resource discovery. In: Proceedings of the 8th World Wide Web Conference (1999)

    Google Scholar 

  6. Chakrabarti, S.: Mining the web, Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2003)

    Google Scholar 

  7. Chen, C.C., Chen, M.C., Sun, Y.: PVA: A Self-Adaptive Personal View Agent System. In: Proceedings of the ACM SIGKDD 2001 Conference (2001)

    Google Scholar 

  8. Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000a)

    Google Scholar 

  9. Dumais, S., Chen, H.: Hierarchical Classification of Web Content. In: Proceedings of the 23rd ACM SIGIR Conference, pp. 256–263 (2000)

    Google Scholar 

  10. Etzioni, O.: The World-Wide-Web: quagmire or gold mine? Communications of the ACM 39(11), 65–68 (1996)

    Article  Google Scholar 

  11. Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing Web document collections based on link semantics. The VLDB Journal 12, 320–332 (2003)

    Article  Google Scholar 

  12. Joachims, T.: A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 1997 International Conference on Machine Learning (1997)

    Google Scholar 

  13. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Research Report of the unit no. VIII(AI), Computer Science Department of the University of Dortmund (1998)

    Google Scholar 

  14. Jones, R., McCallum, A., Nigam, K., Riloff, E.: Bootstrapping for Text Learning Tasks. In: IJCAI 1999 Workshop on Text Mining: Foundation, Techniques and Applications, pp. 52–63 (1999)

    Google Scholar 

  15. Kobayashi, M., Takeda, K.: Information Retrieval on the Web. ACM Computing Surveys 32(2), 144–173 (2000)

    Article  Google Scholar 

  16. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2(1), 1–13 (2000)

    Article  Google Scholar 

  17. Levene, M., Poulovassilis, A. (eds.): Web Dynamics: Adapting to Change in Content, Size, Topology and Use. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  18. Lieberman, H.: Letizia: an Agent That Assists Web Browsing. In: Proceedings of the International Joint Conference on AI (1995)

    Google Scholar 

  19. Liu, B., Chin, C.W., Ng, H.T.: Mining Topic-Specific Concepts and Definitions on the Web. In: Proceedings of the World Wide Web 2003 Conference (2003)

    Google Scholar 

  20. Macskassy, S.A., Banerjee, A., Dovison, B.D., Hirsh, H.: Human Performance on Clustering Web Pages: a Preliminary Study. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (1998)

    Google Scholar 

  21. Mladenic, D.: Personal WebWatcher: design and implementation, Technical Report IJS-DP-7472, SI (1999)

    Google Scholar 

  22. Martins, B., Silva, M.J.: Language Identification in Web Pages. In: Document Engineering Track of the 20th ACM Symposium on Applied Computing (unpublished, 2002)

    Google Scholar 

  23. Mitchell, S., Mooney, M., Mason, J., Paynter, G.W., Ruscheinski, J., Kedzierski, A., Humphreys, K.: iVia Open Source Virtual Library System. D-Lib Magazine 9(1) (2003)

    Google Scholar 

  24. Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B., Williams, J.G.: Visualization of a Document Collection: The VIBE System. Information Processing & Management 29(1), 69–81 (1992)

    Article  Google Scholar 

  25. Silva, M.J., Martins, B.: Web Information Retrieval with Result set Clustering. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902. Springer, Heidelberg (2003)

    Google Scholar 

  26. Yang, Y., Chute, C.G.: An example-based mapping method for text categorization and retrieval. ACM Transaction on Information Systems, 253–277 (1994)

    Google Scholar 

  27. Yang, Y., Pederson, J.: A Comparative Study of Feature Selection in Text Categorization. In: International Conference on Machine Learning (1997)

    Google Scholar 

  28. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

  29. Yang, Y., Slattery, S., Ghani, R.: A Study of Approaches to Hypertext Categorization, pp. 1–25. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  30. Zamir, O., Etzioni, O.: Grouper: A Dynamic clustering Interface to Web Search Results. In: Proceedings of the 1999 World Wide Web Conference (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Escudeiro, N.F., Jorge, A.M. (2006). Semi-automatic Creation and Maintenance of Web Resources with webTopic. In: Ackermann, M., et al. Semantics, Web and Mining. EWMF KDO 2005 2005. Lecture Notes in Computer Science(), vol 4289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_6

Download citation

  • DOI: https://doi.org/10.1007/11908678_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-47697-9

  • Online ISBN: 978-3-540-47698-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics