Crossref is focused on enriching metadata to highlight relationships among works, individuals, institutions, and actions. Metadata matching is the task of finding an identifier for an item based on a structured or unstructured “description” of it. It’s a key part to our enrichment process and one way to address gaps in the scholarly record, while keeping barriers to membership and participation as low as possible – to enable inclusivity of the research nexus.
Some examples of matching tasks include:
- Finding a DOI for a cited article based on a citation string.
- Finding the ROR ID for an organisation based on an affiliation string.
- Finding the ORCID ID for a researcher based on the person’s name and affiliation.
- Finding the grant DOI based on an award number and a funder name.
Metadata matching gives a more complete picture of the research nexus. It discovers missing relationships between entities throughout the scholarly record. In many cases, these relationships are already included in metadata records, but in others we can carry out automated matching to identify the related entity.
Read more about metadata matching in the blog series:
In April 2025, we launched the matching project, which is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated, consolidated matching workflow that will eventually replace all existing and future production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.
project phase | matching task | input | target identifier | status |
---|
1 | funder matching | funder organisation name | ROR ID | in production as part of the legacy CS system; matches available in the REST API |
2 | preprint matching | journal article metadata | preprint DOI | in production as part of the legacy CS system; matches not available in the REST API |
2 | affiliation matching | affiliation string | ROR ID | not in production |
2 | grant matching | funding metadata | grant DOI | not in production |
3 | reference matching | bibliographic reference | DOI | in production as part of the legacy CS system; matches available in the REST API |
3 | title matching | journal title | internal Crossref journal ID | in production as part of the legacy CS system; matches not available in the REST API |
Additional reading and resources
Funder name matching
Funder name matching is used to map funder names to funder organisation identifiers. For example, “London Health Sciences Centre” can be mapped to https://ror.org/037tz0e16. Currently, we match against the Funder Registry. In the new matching service, matching will be done against the ROR Registry.
Preprint matching
Preprint matching is used to discover relationships between preprints and journal articles by matching journal metadata to preprint DOIs.
Background reading: Discovering relationships between preprints and journal articles
Recommended strategy: code
A ground truth evaluation dataset: dataset
A dataset with relationships between preprints and journal articles discovered by matching within Crossref data: dataset
Affiliation matching
Affiliation matching is used to map affiliation strings to organisation identifiers. For example, “Department of Molecular Medicine, Sapporo Medical University, Sapporo 060-8556, Japan” can be mapped to https://ror.org/01h7cca57.
Recommended strategy: code
A ground truth evaluation dataset: dataset
A dataset with relationships involving research organisations discovered by matching within Crossref data: dataset
Grant matching
Grant matching is used to discover funding relationships by matching funding information to grant DOIs.
Background reading:
A dataset with relationships between grants and research outputs discovered by matching within Crossref data: dataset
Citation matching
Citation matching is used to map bibliographic reference strings or metadata to DOIs. For example, “1. Boucher RC (2004) New concepts of the pathogenesis of cystic fibrosis lung disease. Eur Resp J 23: 146–158.” can be mapped to https://doi-org.turing.library.northwestern.edu/10.1183/09031936.03.00057003.
Background reading:
Recommended startegy for unstructured citation matching: code
Title matching
Title matching is used to map journal titles to Crossref’s internal journal records. We run title matching to map data in members’ deposits.