Abstract
Adaptive filtering of news is an area of information retrieval gaining substantial interest as services become more available on the Internet. This paper reports on a number of experiments involving a two-level clustering approach using a variety of techniques including threshold adaptation, topic vocabulary adaptation and both noun phrase and named entity adaptation. Our goal in this exploratory research is to empirically compare alternative configurations of our filtering approach that will allow us to better understand the relative value of the component subsystems.
Article PDF
Similar content being viewed by others
References
Arampatzis A, Beney J, Koster CHA and van der Weide TP (2000) Incrementality, half-life and threshold optimization for adaptive document filtering. TREC-9 Proceedings. http://trec.nist.gov/pubs/trec9/papers/trec9-kun-final.pdf.
Balabanovic M (1997) An adaptive web page recommendation service. In: Johnson WL and Hayes-Roth B, Eds., Proceedings of the First International Conference on Autonomous Agents (Agents '97). Marina del Rey, CA, USA, pp. 378–385.
Balabanovic M and Shoham Y (1997) Fab: Content-based collaborative recommendation. CACM, 40(3):66–70.
Brill E (1992) A simple rule-based part-of-speech tagger. Proc. of the Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 152–155.
Cutting DR, Karger DR and Pedersen JO (1993) Constant interaction-time scatter/gather browsing of very large document collections. In: Proc. of SIGIR'93.
Cutting D, Karger D, Pedersen J and Tukey JW(1992) Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM/SIGIR Conference, Copenhagen.
Delgado J, Ishii M and Ura T (1998) Content-based collaborative information filtering: Actively learning to classify and recommend documents. In: Klush M and Weiss G, Eds., Cooperative Agents II. Proceedings/CIA 1998.LNAI Series Vol. 1435, Springer-Verlag, Berlin, pp. 206–215.
Eichmann D and Srinivasan P (to appear) A cluster-based approach to broadcast news. In: Allen J, Ed., TDT Book.
Eichmann D and Srinivasan P (1999) Filters, webs and answers: The university of Iowa TREC-8 results. Eighth Conference on Text Retrieval, NIST, Washington, D.C.
Eichmann D, Ruiz M, Srinivasan P, Street N, Culy C and Menczer F (1999) A Cluster-based approach to tracking, detection and segmentation of broadcast news. In: Proc. DARPA Broadcast NewsWorkshop, Herndon, VA, pp. 69–75.
Eichmann D, Ruiz ME and Srinivasan P (1998) Cluster-based filtering for adaptive and batch tasks. Seventh Conference on Text Retrieval, NIST, Washington, D.C.
Klinkenberg R and Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the Seventeenth International Conference on Machine Learning. ICML-2000, Morgan Kaufmann, pp. 487–494.
Klinkenberg R and Renz I (1998) Adaptive information filtering: Learning in the presence of concept drifts. AAAI Workshop on Learning for Text Categorization, pp. 1–8.
Lam W and Mostafa J (2001) Modeling user interest shift using a Bayesian approach. JASIS, 52(5):416–429.
Lewis DD (1995) Evaluating and optimizing autonomous text classification systems. In: Research and Development in Information Retrieval (Proceedings of the ACM SIGIR Conference 1995). Springer-Verlag, Berlin, pp. 246–254.
NIST (2000) The year 2000 topic detection and tracking (TDT2000) task definition and evaluation plan. http://www.nist.gov/speech/tests/tdt/tdt2000/evalplan.htm.
Pfeifer U, Poersch T and Fuhr N (1996) Retrieval effectiveness of proper name search methods. Information Processing and Management, 32:667–679.
Porter MF (1980) An algorithm for suffix stripping. Program, 14(3):130–137.
Resnick P, Iacovou N, Sushak M, Bergstrom P and Riedl J (1994) Group Lens:Anopen architecture for collaborative filtering of netnews. In: Proc. of the CSCW 1994 Conference.
Robertson S and Hull DA (2000) The TREC-9 filtering track final report.
Rocchio J (1971) Relevance feedback in information retrieval. In: Salton G Ed., The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall, Englewod Cliffs, NJ, pp. 313–323.
Salton G, Buckley C and Yu CT (1983) An evaluation of term dependence models in information retrieval. In: Salton G and Schneider H-J, Eds., Research and Development in Information Retrieval (Proceedings of a Conference in Berlin in 1982). Springer-Verlag, Berlin, pp. 151–173.
Savoy J (1997) Ranking schemes in hybrid Boolean systems: A new approach. Journal of the American Society for Information Science, 48:235–253.
Shardanand U and Maes P (1995) Social information filtering: Algorithms for automation 'Word of Mouth'. ACM/CHI'95. http://www/acm.org/sigchi/chi95/electronic/documents/papers/us bdy.htm.
Singh MP, Yu B and Venkatraman M (2001) Community-based service location. CACM, 44(4):49–54.
Sparck Jones K, (1981) Ed. Information retrieval experiment. Butterworths, London.
Tague J and Nelson M (1983) Simulation of bibliographic retrieval databases using hyperterms.
Taylor C, Nakhaeizadeh G and Lanquillon L (1997) Structural change and classification. Workshop Notes on Dynamically Changing Domains: Theory Revision and Context Dependence Issues, 9th European Conference on Machine Learning (ECML '97). Prague, Czech Republic, pp. 67–78.
Widmer G and Kubat M(1996) Learning in the presence of concept drift and hidden contexts. Machine Learning, 23:69–101.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Eichmann, D., Srinivasan, P. Adaptive Filtering of Newswire Stories using Two-Level Clustering. Information Retrieval 5, 209–237 (2002). https://doi.org/10.1023/A:1015750012676
Issue Date:
DOI: https://doi.org/10.1023/A:1015750012676