In 2022, we wrote a blog post “Rethinking staff travel, meetings, and events” outlining our new approach to staff travel, meetings, and events with the goal of not going back to ‘normal’ after the pandemic and said that in the future we would report on our efforts to balance online and virtual events, work life balance for staff, and track our carbon emissions. In December 2024, we wrote a blog post, “Summary of the environmental impact of Crossref,” that gave an overview of 2023 and provided the first report on our carbon emissions. Our report on 2023 only just made it into 2024, so we are happy to report on 2024 a little sooner in the year.
To date, there are about 100 Crossref members who have made use of our co-access service for one or more of their books. The service was designed to be a last-resort measure when multiple parties - book publishers, aggregators, and other members - had rights to register book content. Unfortunately, the service allowed members to register multiple DOIs for shared books and book chapters, thereby violating our own core tenet of one DOI per content item. We should not have created a service that violated that tenet, resulting in duplicate DOIs. As we are able to offer an alternative in the form of the multiple resolution service, it is time to switch co-access off. Among other benefits – for the publisher and the authors, creation of a single DOI for each item, regardless of where it might be hosted, will result in more accurate citation counts and usage statistics. We’re retiring co-access at the end of 2026.
This month marks one year since the Dutch Research Council (NWO) introduced grant IDs—an important milestone in our journey toward more transparent and trackable research funding. We created over 1,600 Crossref Grant IDs with associated metadata. We are beginning to see them appear in publications. These early examples show the enormous potential Grant IDs have. They also highlight that publishers could extend their efforts to improve the quality of funding metadata of publications.
eLife recently won a Crossref Metadata Award for the completeness of its metadata, showing itself as the clear leader among our medium-sized members. In this post, the eLife team answers our questions about how and why they produce such high-quality open metadata. For eLife, the work of creating and sharing excellent metadata aligns with their mission to foster open science and supports their preprint-centred publication model, but it also lays the groundwork for all kinds of exciting potential uses.
If you take a peek at our blog, you’ll notice that metadata and community are the most frequently used categories. This is not a coincidence – community is central to everything we do at Crossref. Our first-ever Metadata Sprint was a natural step in strengthening both. Cue fanfare!. And what better way of celebrating 25 years of Crossref?
We designed the Crossref Metadata Sprint as a relatively short event where people can form teams and tackle short problems. What kind of problems? While we expected many to involve coding, teams also explored documenting, translating, researching—anything that taps into our open, member-curated metadata. Our motivation behind this format was to create a space for networking, collaboration, and feedback, centered on co-creation using the scholarly metadata from our REST API, the Public Data File, and other sources.
What have we learned in planning
The journey towards the event was filled with valuable lessons and learnings from our community. Our initial call received submissions from 71 people, which was exciting but presented the first challenge: we felt our event would work better with a relatively smaller group. An additional challenge we faced was the enthusiasm from people from different regions of the world who were eager to join, but needed support to attend in person. It reminded us how global our community is, and how important it is to think about different ways of making participation possible, especially in future events.
We also wanted to make sure that participation wasn’t limited by technical background. The selection process included a preliminary review by several members of our team to bring in a mix of perspectives and reduce bias. The event welcomed participants from all kinds of expertise levels, including colleagues who had never worked with APIs before. We sought to provide common ground for all with several group calls, where we presented introductions to our tools and used the opportunity to collect requests about tools, specific data, and questions from the participants that could enhance their preparation during the sprint.
At the Crossref Metadata Sprint
I’ve recently stumbled upon the following quote from a recognized data scientist:
Numbers have an important story to tell. They rely on you to give them a clear and convincing voice. (Stephen Few) 1
It made me think that we can replace numbers for metadata and the idea still holds. Surrounded by the paleontological collections of the National Museum of Natural History, on 8th of April in Madrid, 21 participants and 5 Crossref staff came together to work on twelve different projects. These ranged from improvements to our Public Data file formats and exploring metadata completeness, to tackling multilingual metadata challenges, understanding citation impact for retracted works, and connecting Retraction Watch metadata with other knowledge graphs metadata.
The different teams that participated in the first Crossref Metadata Sprint.
The initial hours were the most energetic (but not chaotic!) as most of the participants had the chance to interact in person for the first time, ideas were exchanged, and pre-formed groups became more stable (however, one of the advantages of the format is that teams don't have to be rigid). Twelve coffee- and tea-powered projects started taking shape, a few of which are part of larger ideas under development. By the end of the second day, we saw:
Author changes between preprints and published articles.
Coverage of funding information by publisher.
Enriching citations with Crossref metadata.
Funding metadata completeness.
Improvement to the Public Data File.
Interoperability between Crossref DOIs and hash-based identifiers.
University of Tetova’s metadata coverage.
Retraction Watch data mash-up.
Perspective about AI-driven multilingual metadata.
Public Data File in Google Big Query.
Visibility of retractions across citations.
Visualising Crossref geographic member data.
Our team worked as part of some of these projects, providing valuable insights and feedback to the participants. We ended the first session with a group dinner and re-energised for the second day, which started with everybody fully immersed in their tasks. As we approached the conclusion, the groups started preparing some quick slides for a short presentation (that you can find here).
Our team and the participants left excited and looking forward to the next opportunity to collaborate. We certainly see the potential of recreating these spaces, and we’ll work on future editions in a different location. All of the project summaries and notes will remain stored in our metadata sprint Gitlab repo. Would you like to know more about any of these ideas? Let us know in the comments.
The first Crossref Metadata Sprint in a nutshell
Participants
None of this would’ve been possible without our enthusiastic participants. Huge thanks to everyone! Here is the full list of those who attended our inaugural Sprint: