Skip to main content
Log in

Hierarchical directory mapping for category-constrained meta-search

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Hierarchical category directories, in which categories are recursively partitioned into sub-categories, have been provided by many information sources, such as news, online stores and shopping websites. Such information sources categorize instances in their databases, and support category-constrained search in which one usually navigates along the category directory to select a category, and then submits a query to find objects in the selected category whose descriptions match the query. As more and more online sources are available, it is challenging to build a meta-search system which provides a unified directory and a meta-search capability to search and access all sources from different websites in one query submission. One of the fundamental problems in building such a meta-search system is category mapping which maps the selected category in the unified directory to categories provided by the information sources. In this paper, we develop an efficient algorithm for category mapping between hierarchical directories. Our algorithm is based on the following two techniques: consistency refinement and hierarchical substitution, which are developed with extensive use of hierarchical structures. Experiment shows that our approach substantially improves previous approaches, and can be used to implement automatic category mapping for meta-search systems which support category-constrained search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K. (2006). A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10), 1411–1428.

    Article  Google Scholar 

  • Chang, C.-C., & Lin, C.-J. LIBSVM—a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html. Accessed 30 Jun 2013.

  • Chinchor, N. (1992) MUC-4 evaluation metrics. In Proc. of the Fourth Message Understanding Conference (pp. 22–29). McLean, Virginia, USA.

  • Choi, N., Song, I.-Y., Han, H. (2006). A survey on ontology mapping. ACM SIGMOD Record, 35(3), 34–41.

    Article  Google Scholar 

  • Chuang, S.-L., Chang, K.C.-C., Zhai, C. (2007). Collaborative wrapping: A turbo framework for web data extraction. In Proceedings of the IEEE 23rd international conference on data engineering (ICDE) (pp. 1261–1262). Istanbul, Turkey.

  • Doan, A., Madhavan, J., Domingos, P., Halevy, A. (2004). Ontology matching: A machine learning approach. In S. Staab & R. Studer (Eds.), Handbook on ontologies in information systems (pp. 397–416). Springer-Verlag.

  • Duong, T.H., Nguyen, N.T., Jo, G.S. (2009). A Hybrid method for integrating multiple ontologies. Cybernetics and Systems, 40(2), 123–145.

    Article  MATH  Google Scholar 

  • Ehrig, M., & Staab, S. (2004). QOM—quick ontology mapping. In The 3rd international semantic web conference (pp. 683–697). Hiroshima, Japan.

  • He, B., & Chang, K.C.-C. (2004). Automatic complex schema matching across web query interfaces: a correlation mining approach. ACM Transactions on Database Systems, 31(1), 346–395.

    Article  Google Scholar 

  • HTTrack Website Copier (2009). http://www.httrack.com/. Accessed 30 Jun 2013.

  • Kang, C.-L. (2006). Design and development of an integrated product search system. Master’s Thesis, Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan, ROC.

  • Kaza, S., & Chen, H. (2008). Evaluating ontology mapping techniques: an experiment in public safety information sharing. Decision Support Systems, 45(4), 714–728.

    Article  Google Scholar 

  • Kushmerick, N., Weld, D.S., Doorenbos, R.B. (1997). Wrapper induction for information extraction. In: Proc. IJCAI. Nagoya, Aichi, Japan.

  • Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R. (1996). Training algorithms for linear text classifiers. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–306). Zurich, Switzerland.

  • Su, W., Wang, J., Lochovsky, F. (2006). Holistic schema matching for web query interface. In Proceedings of EDBT 2006, LNCS 3896 (pp. 77–94). Munich, Germany.

  • Tsay, J.-J., & Wang, J.-D. (2004). Improving linear classifier for chinese text categorization. Information Processing and Management: An International Journal, 40(2), 223–237.

    Article  MATH  Google Scholar 

  • Tsay, J.-J., Lin, C.-H., Chen, T.-B. (2010). Category mapping for the automatic integration of category-constrained web search. International Journal of Business Intelligence and Data Mining, 5, 43–55.

    Article  Google Scholar 

  • Tsay, J.-J., & Tsay, C.-W. (2010). Visual content structures for wrapper induction in building metasearch systems. Toronto, Canada: Web Intelligence 2010.

    Google Scholar 

  • Zhang, Z., He, B., Chang, K.C.-C. (2004). On-the-fly constraint mapping across web query interfaces. In Proceedings of the VLDB workshop on information integration on the web. Toronto, Ontario, Canada.

  • Zhou, N. (2003). A study on automatic ontology mapping of categorical information. In Proceedings of the 2003 annual national conference on digital government research (pp. 1–4). Boston, Massachusetts, USA.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Hsiang Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsay, JJ., Lin, CH. Hierarchical directory mapping for category-constrained meta-search. J Intell Inf Syst 42, 75–94 (2014). https://doi.org/10.1007/s10844-013-0256-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0256-5

Keywords

Navigation