Skip to main content

Improving the Naming Process for Web Site Reverse Engineering

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3136))

Abstract

We have recently defined RetroWeb, an approach to reverse engineer the informative content of semi-structured web sites. This approach provides a description of the web site informative content at physical, logical and conceptual levels. At each level a meta-model is instantiated using a set of reverse engineering rules. This paper focuses on the naming process used to instantiate the meta-models. We introduce an algorithm that will improve the labeling itself by reducing the number of objects to name. This algorithm is based on the analysis of the dependencies describing the inclusion, exclusion and equality between sets of objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Essanaa, S., Lammari, N.: RetroWeb: Une approche de rétro-conception de contenu informatif de sites Web. In: Proc. of the 8th Maghrebian Conf. on Software Engineering and Artificial Intelligence, sousse may (2004)

    Google Scholar 

  2. Wang, J., Lochovsky, F.: Data Extraction and Label Assignment for Web Databases. In: Proc. 12th Int. Conf. on World Wide Web, Hungary, pp. 187–196 (2003)

    Google Scholar 

  3. Buttler, D., Liu, L., Pu, C.: A fully Automated Object Extraction System for the World Wide Web. In: Proc. Int. Conf. on Distributed Computing Systems, pp. 361–370 (2001)

    Google Scholar 

  4. Chang, C.H., Lui, S.C.: IEPAD: Information Extraction Based on Pattern Discovery. In: Proc. 10th Int. Conf. on World Wide Web, Hong Kong, May 2001, pp. 681–688 (2001)

    Google Scholar 

  5. Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites. In: Proc. 27th Int. Conf. on Very Large Data Base, pp. 109–118 (2001)

    Google Scholar 

  6. Arlotta, L., Crescenzi, V., Mecca, G., Merialdo, P.: Automatic Annotation of Data Extracted from Large Web Sites. In: Proc. 6th Int. Workshop on the Web and Databases, San Diego, pp. 7–12 (2003)

    Google Scholar 

  7. Lammari, N., Laleau, R., Jouve, M.: Multiple Viewpoints of Is_A Inheritance Hierarchies through Normalization and Denormalization Mechanisms. In: Proc. Int. Conf. on Object- Oriented Information systems, September 1998, pp. 9–11. Springer, Paris (1998)

    Google Scholar 

  8. Lammari, N.: Réorganisation des Hiérarchies d’Héritages dans un Schéma Conceptuel Objet. Phd thesis, Conservatoire National des Arts et Métiers (October 24, 1996)

    Google Scholar 

  9. Meziane, F., Kasiran, M. K.: Extracting Unstructured Information from the WWW to support Merchant Existence. In: eCommerce. 8th Int. Conf. on Applications of Natural Language to Information Systems (2003)

    Google Scholar 

  10. Lenat, D. B., Millar, G. A., Yokoi, T.: CYC, WordNet, and EDR: Critiques and Responses. In CACM, vol, 38 (11), pp. 45–48 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Essanaa, S.B., Lammari, N. (2004). Improving the Naming Process for Web Site Reverse Engineering. In: Meziane, F., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2004. Lecture Notes in Computer Science, vol 3136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27779-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27779-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22564-5

  • Online ISBN: 978-3-540-27779-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics