Abstract
The exponential growth of resources available in the Web has made it important to develop instruments to perform search efficiently. This paper proposes an approach for chemical information discovery by using focused crawling. The comparison of combination using various feature representations and classifier algorithms to implement focused crawlers was carried out. Latent Semantic Indexing (LSI) and Mutual Information (MI) were used to extract features from documents, while Naive Bayes (NB) and Support Vector Machines (SVM) were the selected algorithms to compute content relevance score. It was found that the combination of LSI and SVM provided the best solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
University of Sheffield. ChemDex. http://www.chemdex.org/.
University of Liverpool. Links for Chemists. http://www.liv.ac.uk/Chemistry/Links/links.html
Institute of Process Engineering, Chinese Academy of Sciences. Chemistry Portal, Chinese National Science Digital Library, http://chin.csdl.ac.cn/
Bra D.P., Houben G., Kornatzky Y., Post R.: Information Retrieval in Distributed Hypertexts. In Proceedings of the 4th RIAO Conference (1994) 481–491.
Hersovici M., Heydon A., Mitzenmacher M.: The Sharksearch Algorithm-An Application: Tailored Web Site Mapping. Proceedings of the 7th International World Wide Web Conference (1998) 213–225.
Aggarwal C., Al-Garawi F., Yu P.: Intelligent Crawling on the World Wide Web with Arbitrary Predicates. Proceedings of the 10th International World Wide Web Conference (2001) 96–105.
Yang Y., Pedersen O.: A Comparative Study on Feature Selection in Text Categorization. Proceeding of the 14th International Conference on Machine Learning (1997) 412–420.
Berry M., Dumais S., Letsche T.: Computation Methods for Intelligent Information Access. Proceedings of the 1995 ACM/IEEE Supercomputing Conference 1995.
Cortes C., Vapnik V.: Support Vector Networks. Machine Learning. 20(1995) 273–297.
Chang C., Lin C.: LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Lewis D.D.: Naive Bayes at Forty: The Independence Assumption in Information Retrieval. Proceedings of ECML-98, 10th European Conference on Machine Learning (1998) 4–15.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Xia, Z., Guo, L., Liang, C., Li, X., Yang, Z. (2007). Focused Crawling for Retrieving Chemical Information. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_56
Download citation
DOI: https://doi.org/10.1007/978-3-540-74972-1_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74971-4
Online ISBN: 978-3-540-74972-1
eBook Packages: EngineeringEngineering (R0)