Abstract
We introduce an information system for organization and retrieval of news articles from Web publications, incorporating a classification framework based on Support Vector Machines. We present the data model for storage and management of news data and the system architecture for news retrieval, classification and generation of topical collections. We also discuss the classification results obtained with a collection of news articles gathered from a set of online newspapers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Papka, R. and Lavrenko V.. On-line New Event Detection and Tracking. In Proceedings of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 37–45, 1998.
Bowman, C., Danzig, P., Hardy, D., Manber, U. and Schwartz, M.. The Harvest Information Discovery and Access System. In Proceedings of the Second International WWW Conference. pp.763–771, 1994.
Dumais, S., Platt, J., Heckerman, D. and Sahami, M., Inductive Learning Algorithms and Representations for Text Categorization. Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998.
Joachims, T., Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the Tenth European Conference on Machine Learning-ECML, 1998.
Joachims, T., Making large-Scale SVM Learning Practical. Advances in Kernel Methods-Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999.
Manber, U., and Wu, S.. Glimpse: a tool to search through entire file systems. In Proceedings of the USENIX Winter Conference, pages 23–32, 1994.
Maria, N., Gaspar, P., Grilo, N., Ferreira, A. and Silva M. J.. ARIADNE-Digital Library Architecture. In Proceedings of the 2nd European Conference on digital Libraries (ECDL’98), pages 667–668, 1998.
Vapnik, V. N.. The Nature of Statistical Learning Theory. Springer, New York, 1995.
Yang, Y. and Liu X.. A re-examination of text categorization methods. In Proceedings of the 22th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), pages 42–49, 1999.
Yan, T. and Garcia-Molina, H.. SIFT-A Tool for Wide-Area Information Dissemination. In Proceedings of the 1995 Usenix Technical Conference, pages 177–86, 1995.
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald B. T. and Liu X.. Learning approaches for Detecting and Tracking News Events. IEEE Intelligent Systems: Special Issue on Applications of Intelligent Information Retrieval, Vol. 14(4), pages 32–43, July/August 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maria, N., Silva, M.J. (2001). Theme-Based Retrieval of Web News. In: Goos, G., Hartmanis, J., van Leeuwen, J., Suciu, D., Vossen, G. (eds) The World Wide Web and Databases. WebDB 2000. Lecture Notes in Computer Science, vol 1997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45271-0_2
Download citation
DOI: https://doi.org/10.1007/3-540-45271-0_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41826-9
Online ISBN: 978-3-540-45271-3
eBook Packages: Springer Book Archive