Skip to main content

A Tool-Supported Process for Reliable Classification of Web Pages

  • Conference paper
Advances in Software Engineering (ASEA 2009)

Abstract

Reliable classification of Web Application User Interfaces for the aim of extracting specific data for each class of interfaces is a fundamental task in migration, testing and reverse engineering processes involving existing Web Applications. A feasible and reliable classification approach is the one that exploits combinations of Web pages structural features for discriminating the page equivalence class. This paper presents a technique based on an iterative process that allows classification rules composed of Web pages structural features to be deduced in dynamically generated web pages. The process is supported by a tool that partially automates the process steps. In order to assess the process feasibility and cost effectiveness, a case study addressing the problem of generating classification rules for a real Web application has been carried out.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. De Lucia, A., Scanniello, G., Tortora, G.: Identifying Clones in Dynamic Web Sites Using Similarity thresholds. In: Proc. of International Conference on Enterprise Information Systems, Porto, Portugal (2004)

    Google Scholar 

  2. Di Lucca, G.A., Di Penta, M., Fasolino, A.R.: An Approach to Identify Duplicated Web Pages. In: Proc. of 26th IEEE Annual International Computer Software and Application Conference, Oxford, UK, pp. 481–486. IEEE CS Press, Los Alamitos (2002)

    Google Scholar 

  3. Di Lucca, G.A., Fasolino, A.R., Tramontana, P.: Web Pages Classification using Concept Analysis. In: Proc. of the IEEE International Conference on Software Maintenance, ICSM 2007, pp. 385–394. IEEE CS Press, Los Alamitos (2007)

    Google Scholar 

  4. Eisenbarth, T., Koschke, R., Simon, D.: Locating features in source code. IEEE Trans. on Software Engineering 29(3), 210–224 (2003)

    Article  Google Scholar 

  5. Fernández, V.F., Herranz, S.M., Unanue, R.M., Rubio, A.C.: Naïve Bayes Web Page Classification with HTML Mark-Up Enrichment. In: Proc. of Int. Multi-Conference on Computing in the Global Information Technology (ICCGI 2006). IEEE Comp. Society Press, Los Alamitos (2006)

    Google Scholar 

  6. Lindemann, C., Littig, L.: Coarse-grained Classification of Web Sites by their Structural Properties. In: Proc. of WIDM 2006, pp. 35–42. ACM Press, New York (2006)

    Chapter  Google Scholar 

  7. Mesbah, A., Bozdag, E., van Deursen, A.: Crawling Ajax by Inferring User Interface State Changes. In: Proc. of the 8th International Conference on Web Engineering (ICWE 2008), pp. 122–134. IEEE C.S. Press, Los Alamitos (2008)

    Chapter  Google Scholar 

  8. Murugesan, S.: Understanding Web 2.0. IT Professional 9(4), 34–41 (2007)

    Article  Google Scholar 

  9. Ricca, F., Tonella, P.: Using Clustering to Support the Migration from Static to Dynamic Web Pages. In: Proc. of 11th IEEE International Workshop on Program Comprehension, Portland, Oregon, pp. 207–216 (2003)

    Google Scholar 

  10. Song, M., Kang, D., Lee, S.: Feature Reduction for Web Document Classification, pp. 785–788. IEEE Comp. Society Press, Los Alamitos (2006)

    Google Scholar 

  11. Tockit library, http://tockit.sourceforge.net/tockit/index.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amalfitano, D., Fasolino, A.R., Tramontana, P. (2009). A Tool-Supported Process for Reliable Classification of Web Pages. In: Ślęzak, D., Kim, Th., Kiumi, A., Jiang, T., Verner, J., Abrahão, S. (eds) Advances in Software Engineering. ASEA 2009. Communications in Computer and Information Science, vol 59. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10619-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10619-4_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10618-7

  • Online ISBN: 978-3-642-10619-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics