7 minute read.
Behind the scenes improvements to the REST API
UPDATE, 24 August 2021: All pools have been migrated to the new Elasticsearch-backed API, which already appears to be more stable and performant than the outgoing Solr API. Please report any issues via our Crossref issue repository in Gitlab.
UPDATE, 9 August 2021: The cutovers for the polite and Plus pools are delayed again. We’re still working to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one more week–the polite pool is scheduled for 2021 August 17 and the Plus pool is scheduled for 2021 August 24.
UPDATE, 2 August 2021: The cutovers for the polite and Plus pools are delayed. We’ve been mirroring traffic to the new polite pool and want to ensure acceptable performance and stability before serving responses from the new application and infrastructure. Each cutover is currently delayed by one week–the polite pool is scheduled for 2021 August 10 and the Plus pool is scheduled for 2021 August 17.
UPDATE, 13 July 2021: The first stage of the cutover is complete, so requests to the public pool are now being served by the new REST API. We took a slightly different approach to performing the cutover, so the “Documentation” and “Temporary domain” sections below have been updated.
Our REST API is the primary interface for anybody to fetch the metadata of content registered with us, and we’ve been working hard on a more robust REST API service that’s about to go live.
The REST API is free to use and it gets around 300 million requests each month (we encourage users to adhere to our etiquette guidelines to keep things running smoothly). It is used for bibliometric studies, by platforms like Dimensions, by organizations like the National Library of Sweden, and to support countless other efforts.
We also offer enhanced access to our APIs and other services with Metadata Plus, and we recommend it for production services and others that benefit from guaranteed up-time, a higher rate limit, and priority support from our helpful staff.
For a while now, we’ve been working to migrate the REST API from Solr to Elasticsearch and from our datacenter to a cloud platform in order to address issues of scalability and extensibility.
We’re pleased to announce that we’ll be cutting over to the Elasticsearch-backed version of the REST API over the next few weeks, beginning July 13. This cutover will occur one pool at a time–the public pool will be migrated first, followed by the polite pool on August 3, and the Plus pool on August 10 (see ’etiquette’ link above if you’re unfamiliar with our different pools). Please note updates at the top of this post for changes to the original schedule.
We’ve thoroughly tested the functionality and performance of the new REST API, and we’d like to invite you to test it out before we move production traffic to the new service. Try out your favorite API queries at https://api-production-crossref-org.turing.library.northwestern.edu/.
Feature parity, but note a few differences
One of our primary objectives was to maintain feature parity between the old and new services, avoiding any breaking changes that might cause problems for existing services integrating with the REST API. We implemented a regression test suite which has given us the confidence to make such a foundational change. During the course of this project, we found it necessary and a good opportunity to make a few modifications. In each case, we analyzed usage and aimed to avoid making any breaking changes. We hope these represent improvements to the behavior and consistency of the REST API.
group-title filter uses exact matching. This filter previously worked but was undocumented and unsupported.
directory filter is deprecated. This was meant to be an experimental, unsupported filter, and the data has not met the standard we require.
affiliation facet returns counts of affiliation strings rather than counts of terms within affiliation fields (thus resolving this Github issue).
Cursors may be used to page through results from the /members, /funders, and /journals routes, in addition to /works.
While we suggest that everyone use cursors for pagination, we still support the
offset functionality. We have introduced a limit of 80000 for offset values for the /members /funders and /journals routes
offset behavior is slightly changed, now applying to the sum of rows and offsets rather than just offsets.
published field is now present in API responses.
/licenses route returns paged results.
submitted is no longer supported. This was never officially supported or documented.
/quality route has been removed. This was an undocumented, experimental feature.
Funder name in
/works metadata is the name provided by the publisher.
relation fields correctly return an empty object.
isbn-type for a record will be returned. ISBNs for associated volumes will be omitted.
institution field is a list.
query uses different stop word defaults, though we expect querying to remain roughly the same.
API responses may feature slightly different scores, as they come from different backends.
Some technical notes on the cutover
The above changes are documented in our new REST API documentation, which is now automatically generated via Swagger, resulting in more comprehensive coverage and more efficient feature development. During the cutover, the right documentation for you will depend on which pool you are using. The documentation for the new API can be found by visiting the API in a browser, or by navigating to https://api-crossref-org.turing.library.northwestern.edu/help; and the docs for the old API remain here: https://github.com/CrossRef/rest-api-doc. The Github-hosted documentation will be deprecated once the cutover is complete.
This may not come as news, but bears repeating as we mentioned GitHub. We have moved our source code repositories from GitHub to GitLab, including all of our issue tracking.
UPDATE: We ended up performing the public pool cutover via reverse proxies instead of redirects–please disregard the note about temporary domains below. The
api.crossref.org domain will remain the domain regardless of which pool you’re using or where we are in the cutover process.
Please note that the
api.production.crossref.org domain is a temporary domain we are using during this cutover period. Traffic will be redirected to the new service one pool at a time via a
307 http redirect. Once the cutover is complete, we will go back to using the
api.crossref.org domain. Do not update any software, scripts, libraries, tools, etc. to use the temporary domain.
Differences in query results
Due to inherent differences in how Solr and Elasticsearch perform queries and rank results, you may see slightly different results when comparing the old and new services. If for whatever reason your workflow involves using multiple API pools (which we don’t recommend), you may see inconsistent results.
Cursors may break if your script is paging through results at the exact moment the cutover is performed, and you should retry your request once the release is complete. We will post the precise maintenance window to https://status-crossref-org.turing.library.northwestern.edu/.
Feature requests and bug reports should be filed into the Crossref issue repository in Gitlab during this testing phase and once the new Elasticsearch-backed API is live in production.
While we hope the benefits of improved stability and extensibility are as exciting to you as they are to us, “feature parity” may not be the most thrilling message for our API users. In truth, one of the more exciting aspects of completing this migration is the end of the code freeze we instituted at the start of this effort. Now, we can work on new feature development and a continuous stream of bug fixes. We also improved the automatic test coverage as part of the work, meaning we can deliver features with greater confidence.
The first new feature we’ll be delivering via the REST API will be support for the “grants” record type, allowing for the retrieval of metadata for grants that have been registered with us, now numbering over 20,000 from 8 different funder members. This work is well underway and will be released once we are confident that the new REST API is stable in production. From there, we’ll continue to select the highest priority issues from our REST API backlog.
As always, should you have any questions about our REST API, check out the metadata retrieval section of our website, start a discussion on our community forum, file a Gitlab issue as mentioned above, or you can contact us via email@example.com.