skip to main content
10.1145/1555400.1555494acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
poster

From harvesting to cultivating: transformation of a web collecting system into a robust curation environment

Published: 15 June 2009 Publication History

Abstract

Much has been written about the lifecycle of digital objects. This study is instead concerned with the lifecycle of collections and associated services. Online collection environments are built to fulfill specific collecting objectives and constraints. If a collection proves useful within its original hosting environment, it will often be necessary or desirable to move the collection to new environments, in order to support new forms of use and re-aggregation or extract resources from legacy data environments. Such a transformation can be extremely expensive, challenging and prone to error, especially if the collections include complex internal structures and services. When "services make the repository" [1], moving raw data from one location to another will often not be sufficient. Digital curators can pre-empt costly and problematic system migration efforts by integrating collections into environments specifically designed to support long-term preservation, scalability and interoperability [2]. We report on an integration of content and functionality of a feature-rich collecting environment (ContextMiner) into a robust data curation environment (iRODS).
ContextMiner is a web-based service for building collections, through the execution and management of "campaigns" (i.e. sets of associated queries and parameters to harvest content over time). As a part of the VidArch project, we have been using the ContextMiner framework and services for harvesting YouTube videos and associated contextual information on a variety of topics. In July 2008, we released a public beta of ContextMiner, allowing anyone to run similar crawls. There are now more than 100 users. The current implementation - based on a single MySQL database and associated code - has served its intended purposes very well, but it is not a scalable or sustainable basis for offering wide-scale collecting services in support of the diverse array of potential users and use cases.
iRODS (integrated Rule-Oriented Data System), is adaptive policy-driven data grid middleware, which addresses aspects of growth, evolution, openness, and closure - fundamental requirements for digital preservation [3]. iRODS currently scales to hundreds of millions of files, tens of thousands of users, and petabytes of data. It operates in a highly distributed environment with heterogeneous storage resources and allows for growth through federation. It supports evolution through the virtualization of the underlying technology and supports changing business requirements through customization of repository behaviors. It supports openness through a data type agnostic treatment of content. iRODS can be instrumented with policies that support the management of the lifecycle of digital assets and will serve as a unique platform to study repository integration. One key feature is the automation of policy enforcement across distributed data that have been organized into a shared collection. The coupling of other open repositories and iRODS can create greater efficiencies and new types of repository services.
We discuss various repository integration scenarios, their potential benefits, and implications for collection life cycles. The approaches co-locate metadata and content in varied ways and rely on efficiencies found in one repository only, or on the ability to combine policies in both spaces: (1) iRODS to ContexMiner data migration, (2) Policy-based data management for ContextMiner collections, and (3) Policy interchange between ContextMiner and iRODS collections.

References

[1]
Aschenbrenner, A., et al. 2008. The Future of Repositories? Patterns for (Cross--)Repository Architectures. D-Lib Magazine 14, 11/12.
[2]
Chavez, R., Crane, G., Sauer, A., Babeu, A., Packel, A., and Weaver, G. 2007. Services Make the Repository. Journal of Digital Information 8, 2.
[3]
Thibodeau, K. 2008. Architectural Issues in Preservation. Sun Preservation and Archiving Special Interest Group meeting. (Baltimore, November 20, 2008).

Cited By

View all
  • (2018)Using the Web While OfflineHandbook of Research on Contemporary Perspectives on Web-Based Systems10.4018/978-1-5225-5384-7.ch006(108-124)Online publication date: 2018
  • (2014)Cases for the Web in Your Pocket (WiPo)International Journal of Information Technology and Web Engineering10.4018/ijitwe.20140701039:3(40-54)Online publication date: Jul-2014
  • (2014)Implementing the WiPo ArchitectureE-Commerce and Web Technologies10.1007/978-3-319-10491-1_1(1-12)Online publication date: 2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '09: Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
June 2009
502 pages
ISBN:9781605583228
DOI:10.1145/1555400

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. interoperable repositories

Qualifiers

  • Poster

Conference

JCDL '09
JCDL '09: Joint Conference on Digital Libraries
June 15 - 19, 2009
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Using the Web While OfflineHandbook of Research on Contemporary Perspectives on Web-Based Systems10.4018/978-1-5225-5384-7.ch006(108-124)Online publication date: 2018
  • (2014)Cases for the Web in Your Pocket (WiPo)International Journal of Information Technology and Web Engineering10.4018/ijitwe.20140701039:3(40-54)Online publication date: Jul-2014
  • (2014)Implementing the WiPo ArchitectureE-Commerce and Web Technologies10.1007/978-3-319-10491-1_1(1-12)Online publication date: 2014
  • (2013)High quality information provisioning and data pricing2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2013.6547466(290-293)Online publication date: Apr-2013
  • (2013)Towards the Web in Your Pocket: Curated Data as a ServiceAdvanced Methods for Computational Collective Intelligence10.1007/978-3-642-34300-1_3(25-34)Online publication date: 2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media