Skip to main content

Exploring Content and Linkage Structures for Searching Relevant Web Pages

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Abstract

This work addresses the problem of Web searching for pages relevant to a query URL. Based on an approach that uses a deep linkage analysis among vicinity pages, we investigate the Web page content structures and propose two new algorithms that integrate content and linkage analysis for more effective page relationship discovery and relevance ranking. A prototypical Web searching system has recently been implemented and experiments on the system have shown that the new content and linkage based searching methods deliver improved performance and are effective in identifying semantically relevant Web pages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bharat, K., Henzinger, M.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proceedings of 21st International ACM Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)

    Google Scholar 

  2. Dean, J., Henzinger, M.: Finding Related Pages in the World Wide Web. In: Proceedings of 8th International World Wide Web Conference, pp. 389–401 (1999)

    Google Scholar 

  3. Dumais, S.: Improving the retrieval of Information from External Sources. Behavior Research Methods, Instruments, and Computers 23(2), 229–232 (1991)

    Article  Google Scholar 

  4. Golub, G., Van loan, C.: Matrix Computations, 3rd edn. John-Hopkins, Baltimore (1996)

    MATH  Google Scholar 

  5. Hou, J., Zhang, Y.: Effectively Finding Relevant Web Pages from Linkage Information. IEEE Transactions on Knowledge and Data Engineering 15(4), 940–951 (2003)

    Article  Google Scholar 

  6. Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. In: Proceedings of 9th ACM-SIAM Symposium on Discrete Algorithms, ACM Press, New York (1998)

    Google Scholar 

  7. Kleinberg, J., Kumar, R., Raghaven, P., Rajagopalan, S., Tomkins, A.: The Web as a graph: measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  8. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)

    Article  Google Scholar 

  9. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Davis, D., Jiang, E. (2007). Exploring Content and Linkage Structures for Searching Relevant Web Pages. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73871-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73870-1

  • Online ISBN: 978-3-540-73871-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics