Abstract
Current Web search engines are not able to adapt their operations to the evolving needs, interests and preferences of the users. To cope with this problem we developed a system able to classify HTML (or, XML) documents into user pre-specified categories of interests. The system processes the user profile and a set of representative documents- for each category of interest, and produces a classification schema- presented as a set of representative category vectors. The classification schema is then utilized in order to classify new incoming Web documents to one (or, more) of the pre-specified categories of interest. The system offers the users the ability to modify and enrich his/her profile depending on his/her current search needs and interests. In this respect the adaptive and personalized delivery of Web-based information is achieved. Experimental results on an indicative collection of Web-pages show the reliability and effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balabanovic M., Shoham Y. and Yun Y.: “An adaptive Agent for Automated Web Browsing”, Journal on Intelligent and Cooperative Information Systems, Vol.6, No.4, pp.127–158, 1992. 148
Barry C. L.: “User-Defined Relevance Criteria: an Exploratory Study”, Journal of the American Society for Information Science, Vol.45, No.3, pp.149–159, 1994. 147
Barry C. L. and Schamber L.: “Users’ Criteria for Relevance Evaluation: a Cross-Situational Comparison”, Information Processing and Management, Vol.34, No.2/3, pp.219–236, 1998. 147
Chang C. H. and Hsu C.C.: “Enabling Concept-Based Relevance Feedback for Information Retrieve on theWorldWideWeb”, IEEE Transaction on Knowledge and Data Engineering, Special issue on Web Technologies, Vol.11, No.4, pp.595–609, 1999. 155
Craven M., DiPasquo D., Freitag D., McCallum A., Mitchell T., Nigam K. and Slattery S.: “Learning to Construct Knowledge Bases from the World Wide Web”, Artificial Intelligence, Vol.118, No.1–2, pp.69–113, 2000. 148
Esposito F., Malerba D., DiPace L. and Leo P.: “WebClass: an Intermediary for the Classification of HTML Pages”, Demo paper for AI*IA’ 99, Bologna, Italy, 1999. 148
Fawcett T. and Provost F.: “Combining Data Mining and Machine Learning for Effective User Profiling”, Proceedings 2nd KDDM Conference, pp.8–13, 1996. 150
Harman D.: “Relevance Feedback Revisited”, Proceedings 15th ACM SIGIR Conference, pp.1–10, 1992. 155
Kilander F.: “IntFilter Home Page-K2LAB”, Department of Computer Sciences, Stockholm University, Sweden, 1996. Available from: http://www.dsv.su.se/~fk/if_Doc/IntFilter.html. 148
Krulwich B.: “InfoFinder Internet. Andersen Consulting’s Center for Strategic Technology Research”, 1996. Available from: http://www.ac.com/cstar/hsil/agents/framedef if.html. 148
Lawrence S.: “Context inWeb Search”, IEEE Data Engineering Bulletin, Vol.23, No.3, pp.25–32, 2000. 147
Lewis D.D.: “An Evaluation of Phrasal and Clustered Representation on a Text Categorization Task”, Proceedings 15th ACM SIGIR Conference, Compenhagen, Denmark, pp.37–50, 1992. 148
Mitchell T.: “Machine Learning”, McGraw Hill, 1997. 148
Moukas A.: “Amalthaea: Information Discovery and Filtering Using a Multiagent Evolving Ecosystem”, Proceedings Conference on the Practical Application on Intelligent Agents and Multi-Agent Technology, London, UK, 1996. Available from: http://moux.www.media.mit.edu/people/moux/ papers/ PAAM96/. 148
Pazzani M. and Billsus D.: “Learning and Revising User Profiles: the Identification of Interesting Web Sites”, Machine Learning, Vol.27, pp.313–331, 1997. 150
Porter M. F.: “An Algorithm for Suffix Stripping”, Program, Vol.14, No.3, pp.130–137, 1980. 150, 155
Quek C. Y.: “Classification of World Wide Web Documents”, Senior Honors Thesis. School of Computer Science, CMU, 1997. Available from: http:// www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/choonthesis.html. 148
Salton G. and McGill M. J.: “Introduction to Modern Information Retrieval”, McGraw-Hill, New York, 1983. 148, 149, 152
Text-Mining: “Text Mining: Foundations, Techniques and Applications”, Proceedings IJCAI’99 Workshop, Stockholm, Sweden, 1999. 148
Yan T. and Garcia-Molina H.: “SIFT-a tool for Wide Area Information Dissemination”, Proceedings 1995 USENIX Technical Conference, pp.177–186, 1995. Available from: ftp://db.stanford.edu/pub/sift/sift.ps. 148
Yu C.T., Lam K. and Salton G.: “Term Weighting in Information Retrieval Using the Term Precision Model”, Journal of the Association for Computing Machinery, Vol.29, No.1, pp.152–170, 1982. 152
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Potamias, G. (2003). Adaptive Classification of Web Documents to Users Interests. In: Manolopoulos, Y., Evripidou, S., Kakas, A.C. (eds) Advances in Informatics. PCI 2001. Lecture Notes in Computer Science, vol 2563. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-38076-0_10
Download citation
DOI: https://doi.org/10.1007/3-540-38076-0_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-07544-8
Online ISBN: 978-3-540-38076-4
eBook Packages: Springer Book Archive