Blog

 2 minute read.

2022 public data file of more than 134 million metadata records now available

In 2020 we released our first public data file, something we’ve turned into an annual affair supporting our commitment to the Principles of Open Scholarly Infrastructure (POSI). We’ve just posted the 2022 file, which can now be downloaded via torrent like in years past.

We aim to publish these in the first quarter of each year, though as you may notice, we’re a little behind our intended schedule. The reason for this delay was that we wanted to make critical new metadata fields available, including resource URLs and titles with markup.

Crossref metadata is always openly available via our API. We recommend you use this method to incrementally add new and updated records once you’re up and running with an annual public data file. If you’re interested in more frequent and regular “full-file” downloads, consider subscribing to our Metadata Plus program. Plus subscribers have access to monthly snapshots in JSON and XML formats.

Every year our metadata corpus grows. The 2020 file was 65GB and held 112 million records; 2021 came in at 102GB and 120 million records. This year the file weighs in at 160 GB and contains metadata for 134 million records, or all Crossref records registered up to and including April 30, 2022.

Tips for using the torrent and retrieving incremental updates

  • Use the torrent if you want all of these records. Everyone is welcome to the metadata, but it will be much faster for you and much easier on our APIs to get so many records in one file. Here are some tips on how to work with the file.

  • Use the REST API to incrementally add new and updated records once you’ve got the initial file. Here is how to get started (and avoid getting blocked in your enthusiasm to use all this great metadata!).

  • ‘Limited’ and ‘closed’ references are not included in the file or our open APIs. And while bibliographic metadata is generally required, lots of metadata is optional, so that records will vary in quality and completeness.

Questions, comments, and feedback are welcome at support@crossref.org.

Related pages and blog posts

Page owner: Patrick Polischuk   |   Last updated 2022-May-13