Data Feed

Our dataset changes every day, for a number of reasons:

  • New articles are published.
  • Authors self-archive new OA copies to repositories for existing articles.
  • Articles become OA as embargoes expire.
  • Publisher-hosted "Bronze OA" articles (free-to-read but without an open license) are moved from OA to toll-access.

For many use-cases, especially in enterprise contexts, it's important to stay up-to-date with these changes. The snapshot is a poor fit for this, since it's only updated a few times a year. Likewise, the API works poorly for this since it takes many months to scroll through the whole of DOI-space polling for changes.

So, we built the Data Feed to address this issue. Subscribers to the feed get password-protected access to a server where we post daily and weekly changefiles reflecting changes to the database. These changefiles use the same data format as the API and snapshot, and contain a row for every record that has changed in any way since the previous file was generated. We provide both a web interface and JSON endpoint for accessing changefiles.

A Data Feed subscription also provides access to a current snapshot, updated daily. This means that, with your subscription, at any time you can download a complete and current image of our database, all 120M rows.

You can use the snapshot and changefiles together to keep your copy up to date by following these steps:

  1. Download and import the current snapshot.
  2. Download all changefiles, starting with the first file with an update timestamp before that of the snapshot.
  3. Import each changefile by reading it line by line, overwriting or updating the previous record for that row's DOI.
  4. Continue to import changefiles as above, as they are released.

The Data Feed is unique among our products in that it costs money. This is because it's more expensive to run than our other products; harvesting everything once is pretty cheap, but doing it in an ongoing way is not. In due time we'd like to have a transparent pricing structure for these, but for now we are handling it on a more custom basis, as we learn more about the market. As a nonprofit, our goal isn't to get rich off this, but rather to give everyone a fair deal while ensuring a sustainable model that keeps Unpaywall a working resource for the long haul.

If you're interested in the Data Feed, please drop us a line and let us know the size of your organization and your intended use of the data, and we'll get right back to you with a quote. We look forward to hearing from you!