Skip to main content

An Efficient Algorithm of Association Information Mining on Web Pages with Dynamic Scripts

  • Conference paper
Emerging Research in Web Information Systems and Mining (WISM 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 238))

Included in the following conference series:

  • 885 Accesses

Abstract

The hyperlink analysis algorithm is widely used by public search engines. But with the development of the websites with dynamic script, this algorithm is not fit to realize the efficient searching for these related pages, because there is not enough hyperlink information for these pages. The research on the association information mining on web pages with dynamic scripts is progressing gradually. This paper proposes an improved search framework which can be more efficient for the pages with dynamic scripts. Then, by building up state information tables which is in accord with page changes of the same URL for these pages and state transition chains for pages loading, the paper presents an analysis algorithm based on state-interrelated matching of these pages. Finally, the paper detailedly describes entire implementing process of the algorithm, and demonstrates the efficiency of the algorithm by experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xia, B., Gao, J., Wang, T., Yang, D.: An Efficient Valid Page Crawling Approach for Websites with Dynamic Scripts. Journal of Software 20, 176–183 (2009)

    Google Scholar 

  2. Wang, X., Zhou, A.: Linkage Analysis for the World Wide Web and Its Application: A Survey. Journal of Software 14, 1768–1780 (2003)

    Google Scholar 

  3. Zhou, C.: Document Clustering in Search Engine [Dr. Dissertation]. HuaZhong University of Science and Technology, Wuhan (2009)

    Google Scholar 

  4. Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.: AJAX Crawl: Making AJAX Applications Searchable. In: Proceedings of the 25th International Conference on Data Engineering, pp. 78–89 (2009)

    Google Scholar 

  5. Duda, C., Frey, G., Kossmann, D.,et al.: AJAX Search: crawling, indexing and searching web 2.0 applications. In: Proc. VLDB Endow, pp. 1440–1443 (2008)

    Google Scholar 

  6. Mesbah, A., Bozdag, E., van Deursen, A.: Crawling ajax by inferring user interface state changes. In: Proceedings of the 8th International Conference on Web Engineering, pp. 122–134 (2008)

    Google Scholar 

  7. Tombros, A., Ali, Z.: Factors affecting Web page Similarity. In: Proceedings of the 27th European Conference on IR Research, pp. 487–501 (2005)

    Google Scholar 

  8. Chirita, P.A., Olmedilla, D., Nejdl, W.: Finding related pages using the link structure of the WWW. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, pp. 632–635 (2004)

    Google Scholar 

  9. Lin, Z., King, I., Lyu, M.R.: PageSim: A novel link-based measure of web pages similarity. In: Proceedings of the 15th International Conference on World Wide Web, pp. 1019–1020 (2006)

    Google Scholar 

  10. Sadi, M.S., Rahman, M.M.H., Horiguchi, S.: A new algorithm to measure relevance among web pages. In: Proceedings of the 7th International Conference on Data Mining and Information Engineering, pp. 243–251 (2006)

    Google Scholar 

  11. Fang, Q., Yang, G., Wu, Y., Zheng, W.: P2P Web Search Technology. Journal of Software 19, 2706–2719 (2008)

    Article  Google Scholar 

  12. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, T., Tan, L. (2011). An Efficient Algorithm of Association Information Mining on Web Pages with Dynamic Scripts. In: Zhiguo, G., Luo, X., Chen, J., Wang, F.L., Lei, J. (eds) Emerging Research in Web Information Systems and Mining. WISM 2011. Communications in Computer and Information Science, vol 238. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24273-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24273-1_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24272-4

  • Online ISBN: 978-3-642-24273-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics