Our dataset changes every day, for a number of reasons:
For many use-cases, especially in enterprise contexts, it's important to stay up-to-date with these changes. The snapshot is a poor fit for this, since it's only updated a few times a year. Likewise, the API works poorly for this since it takes many months to scroll through the whole of DOI-space polling for changes.
So, we built the Data Feed to address this issue. Subscribers to the feed get password-protected access to a server where we post daily and weekly changefiles reflecting changes to the database. These changefiles use the same data format as the API and snapshot, and contain a row for every record that has changed in any way since the previous file was generated. We provide both a web interface and JSON endpoint for accessing changefiles.
A Data Feed subscription also provides access to a current snapshot, updated daily. This means that, with your subscription, at any time you can download a complete and current image of our database, all 120M rows.
You can use the snapshot and changefiles together to keep your copy up to date by following these steps:
The Data Feed is unique among our products in that it costs money. This is because it's more expensive to run than our other products; harvesting everything once is pretty cheap, but doing it in an ongoing way is not. In due time we'd like to have a transparent pricing structure for these, but for now we are handling it on a more custom basis, as we learn more about the market. As a nonprofit, our goal isn't to get rich off this, but rather to give everyone a fair deal while ensuring a sustainable model that keeps Unpaywall a working resource for the long haul.
If you're interested in the Data Feed, please drop us a line and let us know the size of your organization and your intended use of the data, and we'll get right back to you with a quote. We look forward to hearing from you!