Abstract:
In this paper we propose a Web warehouse system that gathers and manages online news in a semi-automatic fashion, serving as intermediate information repository for a giv...Show MoreMetadata
Abstract:
In this paper we propose a Web warehouse system that gathers and manages online news in a semi-automatic fashion, serving as intermediate information repository for a given user community. We describe its architecture and an ontology-based, focused crawler for automatically collecting relevant news documents. We further discuss the problem of efficient management of the hit frequency profile for all visited news stories and propose a randomized data structure, ABF-Aging Bloom Filter, to cope with this problem. We demonstrate that the proposed system can save a good deal of Web traffic and online time when individual users try to search and retrieve the relevant online news.
Published in: Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005).
Date of Conference: 19-21 May 2005
Date Added to IEEE Xplore: 12 September 2005
Print ISBN:0-7803-9035-0