Abstract
The contents on the web are increasing exponentially as the rapid development of the Internet applications and services continues to expand. A problem in obtaining useful information from vast contents quickly and accurately is facing us while people are enjoying the convenience of the Internet. The immediate response to this problem is a Web Search Engine. We developed a vertical search engine for a certain domain like university. The search engine consists of Crawler, Indexer, and Searcher. The crawler component is implemented with Heritrix crawler based on the mechanism of recursion and archiving. A reusable, extensible index establishment and management subsystem are designed and implemented by open-source package named Lucene in the indexer component. An experiment has been done for Chungbuk National University web sites, and the number of documents the system retrieves is more than 4 hundred times on the average for typical keywords set than those from Google or university’s search engines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Curran, K., Glinchey, J.: Vertical Search Engines. ITB Journal (16), 22–26 (2007)
Chau, M., Chen, H.: Comparison of Three Vertical Search Spiders, pp. 56–62. IEEE Computer Society, Los Alamitos (2003)
Chakrabarti, S., Jaju, R., Joshi, M., Punera, K.: Analyzing Fine-grained Hypertext Features for Enhanced Crawling and Topic Distillation, vol. 25(1). IEEE Computer Society, Los Alamitos (2002)
Cho, J., Page, L.: Efficient crawling through URL ordering. In: Proceedings of the Seventh International World Wide Web Conference, WWW7 (1998)
Gravano, L., Ipeirotis, P., Sahami, M.: Query- vs. Crawling-based Classification of Searchable Web Databases, vol. 25(1). IEEE Computer Society, Los Alamitos (2002)
Gospodnetic, O., Hatcher, E.: Lucene in Action, 2nd edn. Manning Publications Co. (2009)
Sigurðsson, K.: Incremental crawling with Heritrix, National and University Library of Iceland. In: Proc. IWAW (2005)
Stack, M.: Full Text Search of Web Archive Collections, Internet Archive, The Presidio of San Francisco, 116 Sheridan Ave, San Francisco, CA 94129 the 5th International Web Archiving Workshop, IWAW (2005)
Wang, X.: Lucene Nuthc Search Engine Development. Posts and Telcom. Press, Beijing (2008)
The Apache Software Foundation, http://tomcat.apache.org/
Chungbuk search engine, http://search.chungbuk.ac.kr/RSA/front/Search.jsp
Heritrix User Manual, http://crawler.archive.org
Index (search engine), http://en.wikipedia.org/wiki/Index_search_engine
Google search engine, http://www.google.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, HB., Nazareno, F., Jung, SH., Cho, WS. (2011). A Vertical Search Engine for School Information Based on Heritrix and Lucene. In: Lee, G., Howard, D., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2011. Lecture Notes in Computer Science, vol 6935. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24082-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-24082-9_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24081-2
Online ISBN: 978-3-642-24082-9
eBook Packages: Computer ScienceComputer Science (R0)