Abstract
Efficient and accurate enterprise search is a challenging and important problem for specified resources available on the web. Domain-specific enterprise websites are similar in the topic structures and textual contents. Considering the semantic information of website content terms, a novel website feature vector modelling method representing website topic were proposed on the basis of vector space model. The feature vector elements integrated textual semantic information about topic content and structure information through different semantic terms and weighting schema respectively. The contrast recognition performances demonstrate that this feature analysis approach to website topic gives full potentials for specific enterprise web search.
The work was supported partially by the Natural Science Foundation of China (No. 60374057) and Key Program of the Ministry of Education of China (No.211CERS-8).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chakrabarti, S., Dom, B., van den Berg, M.: Focused Crawling: a New Approach to Topic-specific Web Resource Discovery. Computer Networks 31, 1623–1640 (1999)
Ester, M., Kriegel, H.-P., Schubert, M.: Website Mining: A New Way to Spot Competitors, Customers and Suppliers in the World Wide Web. In: Proc. 8th ACM SIGKDD 2002, Edmonton, pp. 249–258 (2002)
Kriegel, H.-P., Schubert, M.: Classification of Websites as Sets of Feature Vectors. In: Proc. International Conference on Databases and Applications (DBA 2004), Innsbruck, pp. 127–132 (2004)
Ester, M., Kriegel, H.-P., Schubert, M.: Accurate and Efficient Crawling for Relevant Websites. In: Proc. 30th International Conference on Very Large Databases (VLDB 2004), Toronto, pp. 396–407 (2004)
Chen, X.Q., Yu, Z.H., Bai, S., et al.: Automatic Information Extraction and Classification of Web Sites. In: Proc. JSCL 1999, Beijing, pp. 87–92 (1999)
Tian, Y.H., Huang, T.J., Gao, W.: A Web Site Representation and Mining Algorithm Using a Multiscale Tree Model. Journal of Software 15, 1393–1404 (2004)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Han, E.-H., Karypis, G.: Centroid-based Document Classification: Analysis and Experimental Results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Dong, B.L., Liu, H.M.: Implementation Web Resource Service to Product Design. In: Proc. International Conference on Programming Language for Machine Tools, Shanghai, pp. 972–977 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dong, B., Liu, H., Hou, Z., Liu, X. (2006). Topic-Based Website Feature Analysis for Enterprise Search from the Web. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds) Web Information Systems – WISE 2006. WISE 2006. Lecture Notes in Computer Science, vol 4255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11912873_11
Download citation
DOI: https://doi.org/10.1007/11912873_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48105-8
Online ISBN: 978-3-540-48107-2
eBook Packages: Computer ScienceComputer Science (R0)