SESQ: A Novel System for Building Domain Specific Web Search Engines

Guo, Qi; Zhou, Lizhu; Guo, Hang; Zhang, Jun

doi:10.1007/11610113_128

Qi Guo²¹,
Lizhu Zhou²¹,
Hang Guo²¹ &
…
Jun Zhang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Included in the following conference series:

Asia-Pacific Web Conference

822 Accesses
2 Citations

Abstract

Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. of the 27th VLDB, Roma, Italy (2001)
Google Scholar
Mecca, G., Merialdo, P., Atzeni, P., Crescenzi, V.: The Araneus guide to web-site devel-opment. Technical report, Dipartimento di Informatica e Automazione, Universita’ di Roma Tre (March 1999)
Google Scholar
Lacroix, Z.: Retrieving and extracting web data with search views and an xml engine. In: International Workshop on Data Integration over the Web, Interlaken, Switzerland (June 2001)
Google Scholar
Xyleme, A Dynamic Warehouse for XML Data of the Web, http://www-rocq.inria.fr/verso/
Zhang, Z., Xing, C., Zhou, L., Feng, J.: An Ontology-based Method for Querying the Web Data. In: Proceedings of IEEE 17th International Conference on Advanced Information Networking and Applications (AINA), China, March 27-29, pp. 628–631 (2003)
Google Scholar
Guo, Q., Zhou, L., Zhang, Z.: A Highly Adatpively Web Extractor Based on Graph Model. In: Proc. of the 6th Asia Pacific Web Conference (April 2004)
Google Scholar
Guo, Q., Guo, H., Zhang, Z., Sun, J., Feng, J.: Schema Driven and Topic Specific Web Crawling. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 594–599. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Qi Guo, Lizhu Zhou, Hang Guo & Jun Zhang

Authors

Qi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Lizhu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Q., Zhou, L., Guo, H., Zhang, J. (2006). SESQ: A Novel System for Building Domain Specific Web Search Engines. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_128

Download citation

DOI: https://doi.org/10.1007/11610113_128
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics