Blog

 3 minute read.

2024 public data file now available, featuring new experimental formats

This year’s public data file is now available, featuring over 156 million metadata records deposited with Crossref through the end of April 2024 from over 19,000 members. A full breakdown of Crossref metadata statistics is available here.

Like last year, you can download all of these records in one go via Academic Torrents or directly from Amazon S3 via the “requester pays” method.

Download the file: The torrent download can be initiated here. Instructions for downloading via the “requester pays” method, along with other tips for using these files, can be found on the “Tips for working with Crossref public data files and Plus snapshots” page.

In January, Martin Eve announced that we had been experimenting with alternative file formats meant to make our public data files easier to use by broader audiences. This year’s file will be published alongside the tools that can be used on the public data file to produce two experimental formats: JSON-lines and SQLite (and a bonus Rust version). You can read more about our thinking behind this work in Martin’s blog post, and we are keen to hear your thoughts on these alternatives.

Our annual public data file is meant to facilitate individuals and organizations interested in working with the entirety of our metadata corpus. Starting with the majority of our metadata records in one file should be much easier than starting from scratch with our API, but because Crossref metadata is always openly available, you can use the API to keep your local copy up to date with new and updated records.

If you’re curious about what you’ll get with the public data file, we’ve also published a sample version so that you can take a peek before committing to downloading the ~212 gb file. This file includes a random sample of JSON files and is available exclusively via torrent here.

We hope you find this public data file useful. Should you have any questions about how to access or use the file, please see the tips below, or share your questions below (you will be redirected to our community forum).

Tips for using the torrent and retrieving incremental updates

  • Use the public data file if you want all Crossref metadata records. Everyone is welcome to the metadata, but it will be much faster for you and much easier on our APIs to get so many records in one file. Here are some tips on how to work with the file.

  • Use the REST API to incrementally add new and updated records once you have the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).

  • While bibliographic metadata is generally required, because lots of metadata is optional, records will vary in quality and completeness.

Questions, comments, and feedback are welcome at support@crossref.org.

Related pages and blog posts

Page owner: Patrick Polischuk   |   Last updated 2024-May-14