Metadata matching is the task of finding an identifier for an item based on a structured or unstructured “description” of it. Some examples include:
- finding a DOI for a cited article based on a citation string
- finding the ROR ID for an organisation based on an affiliation string
- finding the ORCID ID for a researcher based on the person’s name and affiliation
- finding the grant DOI based on an award number and a funder name
Whether done manually as part of the publishing workflow, semi-automatically as an auto-complete functionality for forms, or fully automatically to fill in metadata gaps in larger databases, metadata matching gives us a more complete picture of the research nexus by discovering missing relationships between various entities within and throughout the scholarly record.
Read more about metadata matching in the blog series:
In April 2025, we launched the matching project, which is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated consolidated matching workflow that will eventually replace all existing and future production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.
project phase | matching task | input | target identifier | status |
---|
1 | funder matching | funder organisation name | ROR ID | in production as part of the legacy CS system; matches available in the REST API |
2 | preprint matching | journal article metadata | preprint DOI | in production as part of the legacy CS system; matches not available in the REST API |
2 | affiliation matching | affiliation string | ROR ID | not in production |
2 | grant matching | funding metadata | grant DOI | not in production |
3 | reference matching | bibliographic reference | DOI | in production as part of the legacy CS system; matches available in the REST API |
3 | title matching | journal title | internal Crossref journal ID | in production as part of the legacy CS system; matches not available in the REST API |
Additional reading and resources
Funder name matching
More coming soon…
Preprint matching
Background reading: Discovering relationships between preprints and journal articles
Recommended strategy: code
A ground truth evaluation dataset: dataset
A dataset with relationships between preprints and journal articles discovered by matching within Crossref data: dataset
Affiliation matching
Recommended strategy: code
A ground truth evaluation dataset: dataset
A dataset with relationships involving research organisations discovered by matching within Crossref data: dataset
Grant matching
Background reading:
Citation matching
Background reading:
Recommended startegy for unstructured citation matching: code
Title matching
More coming soon…