skip to main content
10.1145/2464464.2464513acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Sprint methods for web archive research

Published: 02 May 2013 Publication History

Abstract

Web archives provide access to snapshots of the Web of the past, and could be valuable for research purposes. However, access to these archives is often limited, both in terms of data availability, and interfaces to this data. This paper explores new methods to overcome these limitations. It presents "sprint-methods" for performing research using an archived collection of the Dutch news aggregator Website Nu.nl, and for developing and adapting a search system and interface to this data. The work aims to contribute to research in the humanities and social sciences, in particular New Media research employing digital methods to study the Web of the past. Secondly, this work aims to contribute to Computer Science, in the development of novel access tools for Web archives, that facilitate research.

References

[1]
ANTA. http://github.com/medialab/ANTA.
[2]
Aya, S., Arms, W. Y., Walle, L., Kot, B., Mitchell, R., and Dmitriev, P. A research library based on the historical collections of the Internet Archive. D-Lib Magazine 12, 2 (2006), 4.
[3]
Bergvall-Kareborn, B. and Stahlbrost, A. Living Lab: an open and citizen-centric approach for innovation. International Journal of Innovation and Regional Development 1, 4 (2009), 356--370.
[4]
Beynon-Davies, P., Carne, C., Mackay, H., and Tudhope, D. Rapid application development (RAD): an empirical review. European Journal of Information Systems 8, 3 (1999), 211--223.
[5]
Booksprints. http://www.booksprints.net.
[6]
Brügger, N. Historical Network Analysis of the Web. Social Science Computer Review (2012).
[7]
Burner, M. and Kahle, B. Arc file format (1995).
[8]
Canessa, E., Zennaro, M., and Fonda, C. Supporting science in developing countries using open technologies. European Journal of Physics 30, 3 (2009), 651.
[9]
Coetzee, L. ICT for society through society: Application of code-sprints as entrepreneurial enabler. CSIR 3rd Biannual Conference 2010, Pretoria, South Africa.
[10]
CommonCrawl. http://commoncrawl.org.
[11]
Costa, M. and Silva, M. Understanding the Information Needs of Web Archive Users. Proc. IWAW '10, (2010).
[12]
Costa, M. and Silva, M. Characterizing Search Behavior in Web Archives. Proc. IWAW '11, (2011).
[13]
Digital Methods Initiative. http://www.digitalmethods.net.
[14]
Digital Methods Initiative, Dorling Maps Tool. http://wiki.digitalmethods.net/Dmi/ToolDorlingMaps.
[15]
Digital Methods Initiative Winter school 2013 wiki. https://wiki.digitalmethods.net/Dmi/WinterSchool2013
[16]
Digital Methods Initiative Winter school 2013, "Searching the Archive" project. https://wiki.digitalmethods.net/Dmi/Winter13SearchingTheArchive.
[17]
Gephi. http://gephi.org.
[18]
Gomes, D., Miranda, J., and Costa, M. A survey on web archiving initiatives. Proc. TPDL '11, Springer-Verlag (2011), 408--420.
[19]
Hildebrand, L. Community members address health care challenges during hack-a-thon. WMJ, (2012).
[20]
Hockx-Yu, H. The past issue of the web. Proc. ACM WebSci'11 (2011).
[21]
International Internet Preservation Consortium (IIPC) http://netpreserve.org.
[22]
Internet Archive Wayback Machine. http://web.archive.org.
[23]
Koninklijke Bibliotheek. http://www.kb.nl.
[24]
Apache Lucene. http://lucene.apache.org/core.
[25]
Mackay, H, Carne, C, Beynon-Davies, P. Reconfiguring the user: Using rapid application development. Social Studies of Science 30, 5 (Oct. 2000), 737--757.
[26]
Manning, C., Raghavan, P. and Schutze, H. An Introduction to Information Retrieval. Cambridge University Press, Cambridge, 2008.
[27]
NutchWAX. http://archive-access.sourceforge.net/projects/nutch/
[28]
Nu.nl. http://www.nu.nl.
[29]
NWO. http://www.nwo.nl
[30]
Rogers, R. The end of the virtual: Digital methods. Vossiuspers UvA, 2009.
[31]
Terrier IR Platform. http://terrier.org.
[32]
Timeline JS. http://timeline.verite.co.
[33]
Wayback. http://archive-access.sourceforge.net/projects/wayback.
[34]
WebART project. http://www.webarchiving.nl.
[35]
WebART NWO-CATCH project proposal. http://staff.science.uva.nl/~kamps/webart/catch10-clean.pdf.

Cited By

View all
  • (2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
  • (2020)10 Years of Research With and On HackathonsProceedings of the 2020 ACM Designing Interactive Systems Conference10.1145/3357236.3395543(1073-1088)Online publication date: 3-Jul-2020
  • (2019)What's cached is prologue: Reviewing recent web archives research towards supporting scholarly useProceedings of the Association for Information Science and Technology10.1002/pra2.2018.1450550103655:1(327-336)Online publication date: Feb-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '13: Proceedings of the 5th Annual ACM Web Science Conference
May 2013
481 pages
ISBN:9781450318891
DOI:10.1145/2464464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. digital methods
  2. information retrieval
  3. news analysis
  4. search interface
  5. temporal analysis
  6. web archives
  7. web collections
  8. web history

Qualifiers

  • Research-article

Funding Sources

Conference

WebSci '13
Sponsor:
WebSci '13: Web Science 2013
May 2 - 4, 2013
Paris, France

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Upcoming Conference

Websci '25
17th ACM Web Science Conference
May 20 - 24, 2025
New Brunswick , NJ , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
  • (2020)10 Years of Research With and On HackathonsProceedings of the 2020 ACM Designing Interactive Systems Conference10.1145/3357236.3395543(1073-1088)Online publication date: 3-Jul-2020
  • (2019)What's cached is prologue: Reviewing recent web archives research towards supporting scholarly useProceedings of the Association for Information Science and Technology10.1002/pra2.2018.1450550103655:1(327-336)Online publication date: Feb-2019
  • (2018)The colors of the national WebInternational Journal on Digital Libraries10.1007/s00799-016-0202-619:1(95-106)Online publication date: 1-Mar-2018
  • (2017)The evolution of web archivingInternational Journal on Digital Libraries10.1007/s00799-016-0171-918:3(191-205)Online publication date: 1-Sep-2017
  • (2014)Web Archive Search as Research: Methodological and Theoretical ImplicationsAlexandria: The Journal of National and International Library and Information Issues10.7227/ALX.002225:1-2(93-111)Online publication date: 1-Aug-2014
  • (2014)Hard Content, Fab Front-End: Archiving Websites of Dutch Public BroadcastersAlexandria: The Journal of National and International Library and Information Issues10.7227/ALX.002125:1-2(69-91)Online publication date: Aug-2014
  • (2014)Adaptive search systems for web archive researchProceedings of the 5th Information Interaction in Context Symposium10.1145/2637002.2637063(354-356)Online publication date: 26-Aug-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media