Abstract
The abundance of knowledge-rich information on the World Wide Web makes compiling an online e-textbook both possible and necessary. The authors of [7] proposed an approach to automatically generate an e-textbook by mining the ranking lists of the search engine. However, the performance of the approach was degraded by Web pages that were relevant but not actually discussing the desired concept. In this paper, we extend the work in [7] by applying a clustering approach before the mining process. The clustering approach serves as a post-processing stage to the original results retrieved by the search engine, and aims to reach an optimum state in which all Web pages assigned to a concept are discussing that exact concept.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: Proceedings of International Conference on World Wide Web (1998)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2002)
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks 31(11-16), 1361–1374 (1999)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y.: Learning To Cluster Web Search Results. In: Proceedings of the 27th annual international conference on research and development in information retrieval (SIGIR 2004), Sheffield, United Kingdom, pp. 210–217 (July 2004)
Ferragina, P., Gullí, A.: The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 395–398. Springer, Heidelberg (2004)
Vivisimo, http://vivisimo.com/html/index
Chen, J., Li, Q., Wang, L., Jia, W.: Automatically Generating an e-Textbook on the Web. In: Liu, W., Shi, Y., Li, Q. (eds.) ICWL 2004. LNCS, vol. 3143, pp. 35–42. Springer, Heidelberg (2004)
Liu, B., Chin, C.-W., Ng, H.-T.: Mining Topic-specific Concepts and Definitions on the Web. In: Proceedings of International Conference on World Wide Web, 2003, pp. 251–260 (2003)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)
Wang, Y., DeWitt, D.J., Cai, J.-y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: ICDE 2003, pp. 519–530 (2003)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: WebDB 2002, pp. 61–66 (2002)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)
de Castro Reis, D., Golgher, P.B., da Silva, A.S., Laender, A.H.F.: Automatic web news extraction using tree edit distance. In: WWW 2004, pp. 502–511 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Li, Q., Feng, L. (2005). Refining the Results of Automatic e-Textbook Construction by Clustering. In: Lau, R.W.H., Li, Q., Cheung, R., Liu, W. (eds) Advances in Web-Based Learning – ICWL 2005. ICWL 2005. Lecture Notes in Computer Science, vol 3583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11528043_31
Download citation
DOI: https://doi.org/10.1007/11528043_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27895-5
Online ISBN: 978-3-540-31716-6
eBook Packages: Computer ScienceComputer Science (R0)