Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5227))

Included in the following conference series:

Abstract

With Internet growing exponentially, topic-specific web crawler is becoming more and more popular in the web data mining. How to order the unvisited URLs was studied deeply, we present the notion of concept similarity context graph, and propose a novel approach to topic-specific web crawler, which calculates the unvisited URLs’ prediction score by concepts’ similarity in Formal Concept Analysis (FCA), while improving the retrieval precision and recall ratio. We firstly build a concept lattice using the visited pages, extract the core concepts which reflect the user’s query topic from the concept lattice, and then construct our concept similarity context graph based on the semantic similarities between the core concepts and other concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gulli, A., Signorini, A.: The indexable Web is More Than 11.5 BillionPages. In: Proceedings of the 14th International Conference on WWW (WWW 2005), pp. 902–903 (2005)

    Google Scholar 

  2. Chakrabarti, S., Berg, M., Dom, B.: Focused Crawling: a New Approach to Topicspecific Web Resource Discovery. Comput. Networks 31, 1623–1640 (1999)

    Article  Google Scholar 

  3. Ching-Chi, H., Fan, W.: Topic-specific Crawling on the Web with the Measurements of the Relevancy Context Graph. Information Systems 31, 232–246 (2006)

    Article  Google Scholar 

  4. Almpanidis, G., Kotropoulos, C., Pitas, I.: Combining Text and Link Analysis for Focused Crawling—An Application for Vertical Search Engines. Information Systems 32, 886–908 (2007)

    Article  Google Scholar 

  5. Rungsawang, A., Angkawattanawit, N.: Learnable Topic-specific Web Crawler. Journal of Network and Computer Applications 28, 97–114 (2005)

    Article  Google Scholar 

  6. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999)

    MATH  Google Scholar 

  7. CREDO Web Site, http://credo.fub.it/

  8. Anna, F.: Ontology-based Concept Similarity in Formal Concept Analysis. Information Sciences 176, 2624–2641 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  9. Li, Y., Bandar, Z.A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. On Knowledge and Data Engineering 15, 871–882 (2003)

    Article  Google Scholar 

  10. Du, Y.J.: Study and Implement on Intelligent Action of Search Engine. Ph.D. dissertation, Southwest Jiaotong University (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

De-Shuang Huang Donald C. Wunsch II Daniel S. Levine Kang-Hyun Jo

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, Y., Du, Y., Sun, J., Hai, Y. (2008). A Topic-Specific Web Crawler with Concept Similarity Context Graph Based on FCA. In: Huang, DS., Wunsch, D.C., Levine, D.S., Jo, KH. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2008. Lecture Notes in Computer Science(), vol 5227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85984-0_101

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85984-0_101

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85983-3

  • Online ISBN: 978-3-540-85984-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics