Abstract
The growing amount of semi-structured and unstructured data on tourism Web sites with heterogeneous designs requires information extraction (IE) mechanisms, to create, for instance, tourism portals. In order to build semantic eTourism environments, the acquisition of room prices is of particular interest. Room prices and related information often appear in tabular structures, which still challenge Web information extraction techniques. In this paper, we begin by identifying various price table patterns which are characterized by the position of a number of features that determine a room price. We then describe an extended ontology model for tourism prices. Finally, we present TAINEX, a plug-in for functional and structural analysis and data interpretation of price tables, which extends the existing prototype TourIE, a rule/ontology-based information extraction system for Web sites with heterogeneous designs.
Keywords
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Appelt, D., & Israel, D. (1999). Introduction to Information Extraction Technology. A Tutorial Prepared for IJCAI-99, SRI International
Cohen, W.W., Hurst, M., & Jensen, L.S. (2002). A Flexible Learning System for Wrapping Tables and Lists in HTML Documents. In 11th Int. World Wide Web Conference, Honolulu
Dolnicar, S., & Otter, T. (2001). Marktforschung für die Österreichische Hotelklassifizierung (Market Research for the Austrian Hotel Classification Schema; in German). Vienna, Austrian Chamber of Commerce
Embley, D. W., & Tao, C. (2005). Liddle, S.W.: Automating the Extraction of Data from HTML Tables with Unknown Structure. Data Knowledge Engineering 54(1): 3–28
Feilmayr, C., Parzer, S., & Pröll, B. (2009). Ontology-based Information Extraction from Tourism Web sites, accepted for publication in JITT (Journal of Information Technology and Tourism) Workshop on Tourism, Search and the Internet
Gatterbauer, W., & Bohunsky, P. (2006): Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), July 2006, MIT Press, Cambridge
Gatterbauer, W., & Bohunsky, P. (2007). Herzog, M., Krüpl, B., Pollak, B.: Towards Domain-Independent Information Extraction from Web Tables, Proceedings of the 16th international Conference on World Wide Web, May 08–12, 2007, Banff, alberta, Canada
Hepp, M. (2008). GoodRelations An Ontology for Describing Products and Services Offers on the Web. In: Gangemi, A. & Euzenat, J.: Knowledge Engineering: Practice and Patterns. 16th International Conference, EKAW 2008, acitrezza, Sicily, Italy September 29–October 3, 2008, Proceedings. Berlin — Heidelberg — New York: Springer, 329–346
Hurst, M. (2001). Layout and Language. Challenges for Table Understanding on the Web. In Proc. WDA at ICDA’01, 27-30. IEEE
Jung, S., & Kwon, H. (2006). A Scalable Hybrid Approach for Extracting Head Components from Web Tables. /IEEE Trans. on Knowledge and Data Engineering/ 18, 2 Feb.
Kelkar, O., Leukel, J., & Schmitz, V. (2002). Price Modeling in Standards for Electronic Product Catalogs Based on XML. In: Proceedings of the 11th International World Wide Web Conference (WWW 2002), Honolulu, Hawaii, USA, May 7–11
Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J. R., & Mylopoulos, J. (2007). Annotating Accommodation Advertisements Using CERNO. ENTER 2007: 389-400
Lopresti, D., Embley, D.W., Hurst, M., & Nagy, G. (2006). Table Processing Paradigms: A Research Survey, International Journal of Document Analysis and Recognition, 8(2–3), 66–86, Springer, June
Scharrer M. (2009). TAINEX — Ein Tool für Table Information Extraction, Technical Report, Johannes Kepler University Linz
United Nations Economic Commission for Europe (UN/CEFACT), Rec. No. 20 (2006). Codes for Units of Measure Used in International Trade, CEFACT/ICG/2006/IC001
W3C, Tableless Layout, http://www.w3.org/2002/03/csslayout-howto, last visit: 8.Sept.09
Walchhofer, N., Pöttler, M., & Werthner, H. (2008). Semantic Market Monitoring in Tourism, Journal of Information Technology & Tourism Workshop Series, Oktober, Vienna
Wang, Y., & Hu, J. (2002). Detecting Tables in HTML Documents. In Fifth IAPR International Workshop on Document Analysis Systems, Princeton, New Jersey, august
Wang, H. L., Wu, S. H., Wang, I. C., Sung, C. L., Hsu, W. L., & Shih, W. K. (2000). Semantic Search on Internet Tabular Information Extraction for Answering Queries. In Proc. of the 9th Int. Conference on Information and Knowledge Management, CIKM’ 00. ACM
Yang, Y., & Luk, W. (2002). A Framework for Web Table Mining, Workshop On Web Information And Data Management Archive, Proceedings of the 4th International Workshop on Web Information and Data Management, pp.36–42, Virginia, USA
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag/Wien
About this paper
Cite this paper
Buttinger, C., Feilmayr, C., Guttenbrunner, M., Parzer, S., Pröll, B. (2010). Extracting Room Prices from Web Tables — an Ontology-Aware Approach. In: Gretzel, U., Law, R., Fuchs, M. (eds) Information and Communication Technologies in Tourism 2010. Springer, Vienna. https://doi.org/10.1007/978-3-211-99407-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-211-99407-8_19
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-99406-1
Online ISBN: 978-3-211-99407-8
eBook Packages: Business and EconomicsBusiness and Management (R0)