An Efficient Algorithm of Association Information Mining on Web Pages with Dynamic Scripts

Tan, Tao; Tan, Leting

doi:10.1007/978-3-642-24273-1_46

Tao Tan⁶ &
Leting Tan⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 238))

Included in the following conference series:

International Conference on Web Information Systems and Mining

885 Accesses

Abstract

The hyperlink analysis algorithm is widely used by public search engines. But with the development of the websites with dynamic script, this algorithm is not fit to realize the efficient searching for these related pages, because there is not enough hyperlink information for these pages. The research on the association information mining on web pages with dynamic scripts is progressing gradually. This paper proposes an improved search framework which can be more efficient for the pages with dynamic scripts. Then, by building up state information tables which is in accord with page changes of the same URL for these pages and state transition chains for pages loading, the paper presents an analysis algorithm based on state-interrelated matching of these pages. Finally, the paper detailedly describes entire implementing process of the algorithm, and demonstrates the efficiency of the algorithm by experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Xia, B., Gao, J., Wang, T., Yang, D.: An Efficient Valid Page Crawling Approach for Websites with Dynamic Scripts. Journal of Software 20, 176–183 (2009)
Google Scholar
Wang, X., Zhou, A.: Linkage Analysis for the World Wide Web and Its Application: A Survey. Journal of Software 14, 1768–1780 (2003)
Google Scholar
Zhou, C.: Document Clustering in Search Engine [Dr. Dissertation]. HuaZhong University of Science and Technology, Wuhan (2009)
Google Scholar
Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.: AJAX Crawl: Making AJAX Applications Searchable. In: Proceedings of the 25th International Conference on Data Engineering, pp. 78–89 (2009)
Google Scholar
Duda, C., Frey, G., Kossmann, D.,et al.: AJAX Search: crawling, indexing and searching web 2.0 applications. In: Proc. VLDB Endow, pp. 1440–1443 (2008)
Google Scholar
Mesbah, A., Bozdag, E., van Deursen, A.: Crawling ajax by inferring user interface state changes. In: Proceedings of the 8th International Conference on Web Engineering, pp. 122–134 (2008)
Google Scholar
Tombros, A., Ali, Z.: Factors affecting Web page Similarity. In: Proceedings of the 27th European Conference on IR Research, pp. 487–501 (2005)
Google Scholar
Chirita, P.A., Olmedilla, D., Nejdl, W.: Finding related pages using the link structure of the WWW. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 632–635 (2004)
Google Scholar
Lin, Z., King, I., Lyu, M.R.: PageSim: A novel link-based measure of web pages similarity. In: Proceedings of the 15th International Conference on World Wide Web, pp. 1019–1020 (2006)
Google Scholar
Sadi, M.S., Rahman, M.M.H., Horiguchi, S.: A new algorithm to measure relevance among web pages. In: Proceedings of the 7th International Conference on Data Mining and Information Engineering, pp. 243–251 (2006)
Google Scholar
Fang, Q., Yang, G., Wu, Y., Zheng, W.: P2P Web Search Technology. Journal of Software 19, 2706–2719 (2008)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, China West Normal University, Nanchong, 637002, China
Tao Tan & Leting Tan

Authors

Tao Tan
View author publications
You can also search for this author in PubMed Google Scholar
Leting Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Macau, Taipa, Macau, China
Gong Zhiguo
School of Computer, Shanghai University, 200444, Shanghai, China
Xiangfeng Luo
School of Computer and Software, Taiyuan University of Technology, 030024, Taiyuan, China
Junjie Chen
Caritas Institute of Higher Education, 18 Chui Ling Road, Tseung Kwan, Hong Kong SAR, China
Fu Lee Wang
School of Computer and Information Engineering, Shanghai University of Electric Power, 200090, Shanghai, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tan, T., Tan, L. (2011). An Efficient Algorithm of Association Information Mining on Web Pages with Dynamic Scripts. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-24273-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24272-4
Online ISBN: 978-3-642-24273-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics