Information Extraction for Standardization of Tourism Products

Miranda, Nuno; Raminhos, Ricardo; Seabra, Pedro; Gonçalves, Teresa; Saias, José; Quaresma, Paulo

doi:10.1007/978-3-642-25274-7_46

Nuno Miranda²²,
Ricardo Raminhos²²,
Pedro Seabra²²,
Teresa Gonçalves²³,
José Saias²³ &
…
Paulo Quaresma²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7023))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

1304 Accesses

Abstract

Tourism product descriptions are strongly supported on natural language expressions. Appropriate offer selection, according to tourist needs, depends highly on how these are communicated. Since no human interaction is available while presenting tourism products online, the way these are presented, even when using only textual information, is a key success factor for tourism web sites to achieve a purchase. Due to the large amount of tourism offers and the high dynamics in this sector, manual data management is not a reliable or a scalable solution. This paper presents a prototype developed for automatic extraction of relevant knowledge from tourism-related natural language texts. Captured knowledge is represented in a normalized format and new textual descriptions are produced according to available marketing channels. At this phase, the prototype is focused on hotel descriptions and is already using real operational data retrieved from the KEY for Travel tourism platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aitken, J.S.: Learning information extraction rules: An inductive logic programming approach. In: van Harmelen, F. (ed.) ECAI 2002 15th European Conference on Artificial Intelligence, Lyon, France, pp. 355–359 (2002)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100–110 (1999)
Google Scholar
Development, H.: Jena – A Semantic Web Framework (March 2010), http://jena.sourceforge.net
Freitag, D., McCallum, A.: Information extraction with hmm structures learned by stochastic optimization. In: AI 2000 17th National Conference on Artificial Intelligence, pp. 584–589. AAAI Press (2000)
Google Scholar
Grau, J.: Travel Agencies Online. eMarketer (2005)
Google Scholar
Grishman, R.: Information Extraction: Techniques and Challenges. In: Pazienza, M.T. (ed.) SCIE 1997. LNCS, vol. 1299, pp. 10–27. Springer, Heidelberg (1997)
Google Scholar
Hobbs, J.R., Bear, J., Israel, D., Tyson, M.: Fastus: A finite-state processor for information extraction from real-world text. In: IJCAI 1993 13th International Joint Conference on Artificial Intelligence, pp. 1172–1178 (1993)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999 16th International Conference on Machine Learning (1999)
Google Scholar
Klein, D., Manning, C.D.: Conditional structure versus conditional estimation in nlp models. In: ACL 2002 Conference on Empirical Methods in Natural Language Processing, pp. 9–16 (2002)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 18th International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966); originally publish in Russian
MathSciNet MATH Google Scholar
Martin, A., Przybocki, M. (eds.): 2003 NIST Language Recognition Evaluation (2003)
Google Scholar
McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Mladenić, D., Grobelnik, M.: Feature selection for unbalanced class distribution and naïve Bayes. In: ICML 1999 16th International Conference on Machine Learning, pp. 258–267 (1999)
Google Scholar
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, pp. 185–208. MIT Press (1999)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Salton, G., Wang, A., Yang, C.: A vector space model for information retrieval. Journal of the American Society for Information Retrieval 18, 613–620 (1975)
MATH Google Scholar
Schütze, H., Hull, D., Pedersen, J.: A comparison of classifiers and document representations for the routing problem. In: SIGIR 1995 18th ACM International Conference on Research and Developement in Information Retrieval, Seattle, US, pp. 229–237 (1995)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Google Scholar
Tong, R., Appelbaum, L.: Machine learning for knowledge-based document routing. In: Harman (ed.) TREC 2002 2nd Text Retrieval Conference (1994)
Google Scholar
Vapnik, V.: Statistical learning theory. Wiley, NY (1998)
MATH Google Scholar
ViaTecla: KEYforTravel platform (March 2010), http://www.keyfortravel.com
Voorhees, E. (ed.): MUC7, 7th Message Understanding Conference. Science Applications International Corporation (SAIC), Fairfax, Virginia (1998)
Google Scholar
W3C: OWL Web Ontology Language Guide (March 2010), http://www.w3.org/TR/owl-guide
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

VIATECLA SA, Almada, Portugal
Nuno Miranda, Ricardo Raminhos & Pedro Seabra
Dep. Informática, Universidade de Évora, Évora, Portugal
Teresa Gonçalves, José Saias & Paulo Quaresma

Authors

Nuno Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Raminhos
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Seabra
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
José Saias
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Quaresma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science School, University of the Basque Country, PÂº Manuel de Lardizabal 1, 20018, Donostia-San Sebastian, Spain
Jose A. Lozano
Computing Systems Department, University of Castilla-La Mancha, Campus Universitario s/n, 02071, Albacete, Spain
José A. Gámez
Dep. Statistics, O.R. and Computation, University of La Laguna, 38271, La Laguna, S.C. Tenerife, Spain
José A. Moreno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miranda, N., Raminhos, R., Seabra, P., Gonçalves, T., Saias, J., Quaresma, P. (2011). Information Extraction for Standardization of Tourism Products. In: Lozano, J.A., Gámez, J.A., Moreno, J.A. (eds) Advances in Artificial Intelligence. CAEPIA 2011. Lecture Notes in Computer Science(), vol 7023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25274-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-25274-7_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25273-0
Online ISBN: 978-3-642-25274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics