Abstract
Nowadays the Web represents a huge heterogeneous data source. The rapid growth of data volume and the dynamic nature of the Web make it difficult for users to find relevant information for a specific domain. To meet this demand, we have designed and implemented a novel system, called SESQ for building domain specific search engine. Using SESQ, the user first needs to specify the data schema of the domain and gives the seed for the data of the schema; then writes extracting rules to indicate how to get instance data of the schema from relevant web pages. The system will extract the instance data for the schema from the web pages and find new web sites and web pages relevant to the schema by crawling. SESQ provides a highly efficient data storage and index structure for the collected data, and provides an interactive query interface for end users to represent structural query on the data. Besides, the data can be further analyzed by some analytical tools (such as OLAP) .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. of the 27th VLDB, Roma, Italy (2001)
Mecca, G., Merialdo, P., Atzeni, P., Crescenzi, V.: The Araneus guide to web-site devel-opment. Technical report, Dipartimento di Informatica e Automazione, Universita’ di Roma Tre (March 1999)
Lacroix, Z.: Retrieving and extracting web data with search views and an xml engine. In: International Workshop on Data Integration over the Web, Interlaken, Switzerland (June 2001)
Xyleme, A Dynamic Warehouse for XML Data of the Web, http://www-rocq.inria.fr/verso/
Zhang, Z., Xing, C., Zhou, L., Feng, J.: An Ontology-based Method for Querying the Web Data. In: Proceedings of IEEE 17th International Conference on Advanced Information Networking and Applications (AINA), China, March 27-29, pp. 628–631 (2003)
Guo, Q., Zhou, L., Zhang, Z.: A Highly Adatpively Web Extractor Based on Graph Model. In: Proc. of the 6th Asia Pacific Web Conference (April 2004)
Guo, Q., Guo, H., Zhang, Z., Sun, J., Feng, J.: Schema Driven and Topic Specific Web Crawling. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 594–599. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, Q., Zhou, L., Guo, H., Zhang, J. (2006). SESQ: A Novel System for Building Domain Specific Web Search Engines. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_128
Download citation
DOI: https://doi.org/10.1007/11610113_128
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)