Abstract
We have recently defined RetroWeb, an approach to reverse engineer the informative content of semi-structured web sites. This approach provides a description of the web site informative content at physical, logical and conceptual levels. At each level a meta-model is instantiated using a set of reverse engineering rules. This paper focuses on the naming process used to instantiate the meta-models. We introduce an algorithm that will improve the labeling itself by reducing the number of objects to name. This algorithm is based on the analysis of the dependencies describing the inclusion, exclusion and equality between sets of objects.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Essanaa, S., Lammari, N.: RetroWeb: Une approche de rétro-conception de contenu informatif de sites Web. In: Proc. of the 8th Maghrebian Conf. on Software Engineering and Artificial Intelligence, sousse may (2004)
Wang, J., Lochovsky, F.: Data Extraction and Label Assignment for Web Databases. In: Proc. 12th Int. Conf. on World Wide Web, Hungary, pp. 187–196 (2003)
Buttler, D., Liu, L., Pu, C.: A fully Automated Object Extraction System for the World Wide Web. In: Proc. Int. Conf. on Distributed Computing Systems, pp. 361–370 (2001)
Chang, C.H., Lui, S.C.: IEPAD: Information Extraction Based on Pattern Discovery. In: Proc. 10th Int. Conf. on World Wide Web, Hong Kong, May 2001, pp. 681–688 (2001)
Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites. In: Proc. 27th Int. Conf. on Very Large Data Base, pp. 109–118 (2001)
Arlotta, L., Crescenzi, V., Mecca, G., Merialdo, P.: Automatic Annotation of Data Extracted from Large Web Sites. In: Proc. 6th Int. Workshop on the Web and Databases, San Diego, pp. 7–12 (2003)
Lammari, N., Laleau, R., Jouve, M.: Multiple Viewpoints of Is_A Inheritance Hierarchies through Normalization and Denormalization Mechanisms. In: Proc. Int. Conf. on Object- Oriented Information systems, September 1998, pp. 9–11. Springer, Paris (1998)
Lammari, N.: Réorganisation des Hiérarchies d’Héritages dans un Schéma Conceptuel Objet. Phd thesis, Conservatoire National des Arts et Métiers (October 24, 1996)
Meziane, F., Kasiran, M. K.: Extracting Unstructured Information from the WWW to support Merchant Existence. In: eCommerce. 8th Int. Conf. on Applications of Natural Language to Information Systems (2003)
Lenat, D. B., Millar, G. A., Yokoi, T.: CYC, WordNet, and EDR: Critiques and Responses. In CACM, vol, 38 (11), pp. 45–48 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Essanaa, S.B., Lammari, N. (2004). Improving the Naming Process for Web Site Reverse Engineering. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-27779-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22564-5
Online ISBN: 978-3-540-27779-8
eBook Packages: Springer Book Archive