Crossref holds metadata for approximately 150 million scholarly artifacts. These range from peer reviewed journal articles through to scholarly books through to scientific blog posts. In fact, amid such heterogeneity, the only singular factor that unites such items is that they have been assigned a document object identifier (DOI); a unique identification string that can be used to resolve to a resource pertaining to said metadata (often, but not always, a copy of the work identified by the metadata).
We’re equally sad and proud to report that Rachael Lammey is moving on in her career to the very lucky team at 67Bricks. Her last day at Crossref is today, Friday 16th February. Which is too soon for us, but very exciting for her!
It’s hard to overstate Rachael’s impact on Crossref’s growth and success in her 12 years here. She started as a Product Manager where she developed that role into a broad and central function, and soon moved into the newly-formed community team as International Outreach Manager where she grew important programs such as Sponsors, Ambassadors, a series of ‘LIVE’ events around the world, and she went on to manage her own team and establish some of the most important strategic relationships that Crossref now feels fortunate to have.
Great news to share: our Executive Director, Ed Pentz, has been selected as the 2024 recipient of the Miles Conrad Award from the USA’s National Information Standards Organization (NISO). The award is testament to an individual’s lifetime contribution to the information community, and we couldn’t be more delighted that Ed was voted to be this year’s well-deserved recipient.
During the NISO Plus conference this week in Baltimore, USA, Ed accepted his award and delivered the 2024 Miles Conrad lecture, reflecting on how far open scholarly infrastructure has come, and the part he has played in this at Crossref and through numerous other collaborative initiatives.
Metadata about research objects and the relationships between them form the basis of the scholarly record: rich metadata has the potential to provide a richer context for scholarly output, and in particular, can provide trust signals to indicate integrity. Information on who authored a research work, who funded it, which other research works it cites, and whether it was updated, can act as signals of trustworthiness. Crossref provides foundational infrastructure to connect and preserve these records, but the creation of these records is an ongoing and complex community effort.
When someone links their data online, or mentions research on a social media site, we capture that event and make it available for anyone to use in their own way. We provide the unprocessed data—you decide how to use it.
Before the expansion of the Internet, most discussion about scholarly content stayed within scholarly content, with articles citing each other. With the growth of online platforms for discussion, publication and social media, we have seen discussions extend into new, non-traditional venues.
Crossref Event Data captures this activity and acts as a hub for the storage and distribution of this data. An event may be a citation in a dataset or patent, a mention in a news article, Wikipedia page or on a blog, or discussion and comment on social media.
How Event Data works
Event Data monitors a range of sources, chosen for their importance in scholarly discussion. We make events available via an API for users to access and interpret. Our aim is to provide context to published works and connect diverse parts of the dialogue around research. Learn more about the sources from which we capture events.
The Event Data API provides raw data about events alongside context: how and where each event was collected. Users can process this data to suit their requirements.
What is Event Data for?
Event Data can be used for a number of different purposes:
Authors can find out where their work has been reused and commented on.
Readers can access more context around published research, including links to supporting documents and commentary that aren’t in a journal article.
Publishers and funders can assess the impact of published research beyond citations.
Service providers can enrich, analyze, interpret and report via their own tools
Data intelligence and analysis organisations can access a broad range of sources with commentary relevant to research articles.
Anyone can contribute to Event Data by mentioning the DOI or URL of a Crossref-registered work in one of the monitored sources. We also welcome third parties who wish to send events or contribute to code that covers new sources. Learn more about contributing to or using Crossref Event Data.
Agreement and fees for Event Data
Event Data is a public API, giving access to raw data, and there are no fees. In the future we will introduce a service-based offering with additional features and benefits. Learn more about the Event Data terms.
What is an event?
In the broadest sense, an event is any time someone refers to a research article with a registered DOI anywhere online. Ideally we would capture all events, but there are limitations:
We can’t monitor the entire Internet, and instead check sites that are most likely to discuss academic content. There are still venues that could be relevant and that we do not cover yet.
Users online refer to academic content in different ways, sometimes using the DOI but more often using the URL or just the article name. We try to decode mentions of DOIs or a publisher website to get a match to an article but it isn’t always possible. This means we may miss mentions of an article even from sources we are tracking.
At present we are not able to track events where no link is included and only the title or other part of the metadata is mentioned.
For Crossref Event Data, an event consists of three parts:
A subject: where was the research mentioned? (such as Wikipedia)
An object: which research was mentioned? (a Crossref or DataCite DOI)
A relationship: how was the research mentioned? (such as cites or discusses)
We determine the relationship from the source of the event, it is an indication of how the subject and object are linked based on broad categories.
Software called agents collect events from various data sources. Most agents are written and operated by Crossref with some code written by our partners. Possible events are passed to the percolator software, which tries to match the event with an object DOI. This process is fully automated.
We perform periodic automated checks to the integrity of the data and update event types. Deduplication is also part of the process performed by the percolator.
To provide transparency, we keep an evidence record about how we matched the object to the subject. Learn more about transparency in Event Data, including links to the open source code and data.
The following agents currently collect data:
Relationships, references, and links to DataCite registered content
Links to Crossref registered content
Recommendations of research publications
Annotations in Hypothes.is
Discussed in blogs and media
Discussed on Reddit
Discussed on sites linked to in subreddits
Stack Exchange Network
Discussed on StackExchange sites
References on Wikipedia pages
Discussed on Wordpress.com sites
We are planning to increase the number of agents and sources and welcome contact from anyone who can contribute. Patent Event Data was historically collected from The Lens. Events from Twitter were collected until February 2023, note that all Twitter events have been removed from search results in accordance with our contract with Twitter; see the Community Forum for more information.
What Event Data is not
By providing Event Data, Crossref provides an open, transparent information source for the scholarly community and beyond. It is important to understand, however, that it may not be suitable for all potential users. Here are some of the limitations:
It is not a service that provides metrics, collated reports, or offers data analysis.
Crossref does not build applications or website plugins for Event Data, for example for displaying results on publisher websites. We do, however, welcome third parties who wish to develop such platforms.
Event Data collection is fully automated and therefore may contain errors or be incomplete, we cannot provide any guarantees in this regard and users must assess the quality of the data required for their particular use case. There may also be delays between an event occurring and it appearing in Event Data.
Events might be missed due to the limitations of the collection algorithms we use. There is also a small possibility that we link an event to the wrong object.
Event Data does not cover every source of academic discussion. In some cases this is because there is no public access to the data; in others it is because we have not had the capacity to build an agent.
While we hope the data is useful for many purposes, we encourage users to be responsible and exercise caution when making use of Event Data.
Page owner: Martyn Rittman | Last updated 2020-October-06