Skip to main content

SESQ: A Novel System for Building Domain Specific Web Search Engines

  • Conference paper
Frontiers of WWW Research and Development - APWeb 2006 (APWeb 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Abstract

Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. of the 27th VLDB, Roma, Italy (2001)

    Google Scholar 

  2. Mecca, G., Merialdo, P., Atzeni, P., Crescenzi, V.: The Araneus guide to web-site devel-opment. Technical report, Dipartimento di Informatica e Automazione, Universita’ di Roma Tre (March 1999)

    Google Scholar 

  3. Lacroix, Z.: Retrieving and extracting web data with search views and an xml engine. In: International Workshop on Data Integration over the Web, Interlaken, Switzerland (June 2001)

    Google Scholar 

  4. Xyleme, A Dynamic Warehouse for XML Data of the Web, http://www-rocq.inria.fr/verso/

  5. Zhang, Z., Xing, C., Zhou, L., Feng, J.: An Ontology-based Method for Querying the Web Data. In: Proceedings of IEEE 17th International Conference on Advanced Information Networking and Applications (AINA), China, March 27-29, pp. 628–631 (2003)

    Google Scholar 

  6. Guo, Q., Zhou, L., Zhang, Z.: A Highly Adatpively Web Extractor Based on Graph Model. In: Proc. of the 6th Asia Pacific Web Conference (April 2004)

    Google Scholar 

  7. Guo, Q., Guo, H., Zhang, Z., Sun, J., Feng, J.: Schema Driven and Topic Specific Web Crawling. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 594–599. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, Q., Zhou, L., Guo, H., Zhang, J. (2006). SESQ: A Novel System for Building Domain Specific Web Search Engines. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_128

Download citation

  • DOI: https://doi.org/10.1007/11610113_128

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics