Skip to main content

Web Content Management Systems Archivability

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

  • 967 Accesses

Abstract

Web archiving is the process of collecting and preserving web content in an archive for current and future generations. One of the key issues in web archiving is that not all websites can be archived correctly due to various issues that arise from the use of different technologies, standards and implementation practices. Nevertheless, one of the common denominators of current websites is that they are implemented using a Web Content Management System (WCMS). We evaluate the Website Archivability (WA) of the most prevalent WCMSs. We investigate the extent to which each WCMS meets the conditions for a safe transfer of their content to a web archive for preservation purposes, and thus identify their strengths and weaknesses. More importantly, we deduce specific recommendations to improve the WA of each WCMS, aiming to advance the general practice of web data extraction and archiving.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://managewp.com/14-surprising-statistics-about-wordpress-usage.

  2. 2.

    https://www.drupal.org.

  3. 3.

    http://validator.w3.org/.

  4. 4.

    http://jigsaw.w3.org/css-validator/.

  5. 5.

    http://validator.w3.org/feed/.

  6. 6.

    http://phantomjs.org/.

  7. 7.

    http://s3.amazonaws.com/alexa-static/top-1m.csv.zip.

  8. 8.

    http://en.wikipedia.org/wiki/List_of_content_management_systems.

  9. 9.

    https://github.com/vbanos/wcms-archivability-paper-data.

  10. 10.

    http://www.sitemap.org/.

References

  1. Banos, V., Kim, Y., Ross, S., Manolopoulos, Y.: CLEAR: a credible method to evaluate website archivability. In: Proceedings 10th International Conference on Preservation of Digital Objects (iPRES) (2013)

    Google Scholar 

  2. Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the clear+ method. Int. J. Digital Libr. (2015)

    Google Scholar 

  3. Blanvillain, O., Kasioumis, N., Banos, V.: Blogforever crawler: techniques and algorithms to harvest modern weblogs. In: Proceedings 4th International Conference on Web Intelligence, Mining & Semantics (WIMS) (2014)

    Google Scholar 

  4. Boiko, B.: Understanding content management. Bull. Am. Soc. Inf. Sci. Technol. 28(1), 8–13 (2001)

    Article  MathSciNet  Google Scholar 

  5. Coalition, D.P.: Institutional strategies - standards and best practice guidelines (2012). http://www.dpconline.org/advice/preservationhandbook/institutional-strategies/standards-and-best-practice-guidelines. Accessed 10 November 2014

  6. Day, M.: Metadata, curation reference manual (2005). http://www.dcc.ac.uk/resources/curation-reference-manual/completed-chapters/metadata. Accessed 10 November 2014

  7. Donnelly, M.: JSTOR/Harvard Object Validation Environment (JHOVE). Digital Curation Centre Case Studies and Interviews (2006)

    Google Scholar 

  8. Faheem, M., Senellart, P.: Intelligent and adaptive crawling of web applications for web archiving. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 306–322. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Fernández-Garcia, N., Sánchez-Fernandez, L., Villamor-Lugo, J.: Next generation web technologies in content management. In: Proceedings (companion) 13th International Conference on World Wide Web (WWW), pp. 260–261 (2004)

    Google Scholar 

  10. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T.: Hypertext transfer protocol-http/1.1 (1999). http://tools.ietf.org/html/rfc2616. Accessed 10 November 2014

  11. Gomes, D., Costa, M., Cruz, D., Miranda, J., Fontes, S.: Creating a billion-scale searchable web archive. In: Proceedings (companion) 22nd International Conference on World Wide Web (WWW), pp. 1059–1066 (2013)

    Google Scholar 

  12. Kasioumis, N., Banos, V., Kalb, H.: Towards building a blog preservation platform. World Wide Web 17(4), 799–825 (2014)

    Article  Google Scholar 

  13. Kelly, B., Guy, M.: Approaches to archiving professional blogs hosted in the cloud. In: Proceedings 7th International Conference on Preservation of Digital Objects (iPRES) (2010)

    Google Scholar 

  14. Lawrence, S., Pennock, D.M., Flake, G.W., Krovetz, R., Coetzee, F.M., Glover, E., Nielsen, F.Å., Kruger, A., Giles, C.L.: Persistence of web references in scientific research. IEEE Comput. 34(2), 26–31 (2001)

    Article  Google Scholar 

  15. McKeever, S.: Understanding web content management systems: evolution, lifecycle and market. Ind. Manage. Data Syst. 103(9), 686–692 (2003)

    Article  Google Scholar 

  16. Niu, J.: An overview of web archiving. D-Lib Magazine, 18(3/4) (2012)

    Google Scholar 

  17. Pennock, M., Davis, R.: Archivepress: a really simple solution to archiving blog content. In: Proceedings 6th International Conference on Preservation of Digital Objects (iPRES) (2009)

    Google Scholar 

  18. Pinsent, E., Davis, R., Ashley, K., Kelly, B., Guy, M., Hatcher, J.: PoWR: the preservation of web resources handbook (2010)

    Google Scholar 

  19. Rumianek, M.: Archiving and recovering database-driven websites. D-Lib Magazine 19(1/2) (2013)

    Google Scholar 

  20. W3Techs. Usage of content management systems for websites (2014). http://w3techs.com/technologies/overview/content_management/all. Accessed 10 November 2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vangelis Banos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Banos, V., Manolopoulos, Y. (2015). Web Content Management Systems Archivability. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics