Metadata matching

Crossref is focused on enriching metadata to highlight relationships among works, individuals, institutions, and actions. Metadata matching is the task of finding an identifier for an item based on a structured or unstructured “description” of it. It’s a key part to our enrichment process and one way to address gaps in the scholarly record, while keeping barriers to membership and participation as low as possible – to enable inclusivity of the research nexus.

Some examples of matching tasks include:

  • Finding a DOI for a cited article based on a citation string.
  • Finding the ROR ID for an organisation based on an affiliation string.
  • Finding the ORCID ID for a researcher based on the person’s name and affiliation.
  • Finding the grant DOI based on an award number and a funder name.

Metadata matching gives a more complete picture of the research nexus. It discovers missing relationships between entities throughout the scholarly record. In many cases, these relationships are already included in metadata records, but in others we can carry out automated matching to identify the related entity.

Read more about metadata matching in the blog series:

In April 2025, we launched the matching project, which is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated, consolidated matching workflow that will eventually replace all existing and future production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.

project phasematching taskinputtarget identifierstatus
1funder matchingfunder organisation nameROR IDin production as part of the legacy CS system; matches available in the REST API
2preprint matchingjournal article metadatapreprint DOIin production as part of the legacy CS system; matches not available in the REST API
2affiliation matchingaffiliation stringROR IDnot in production
2grant matchingfunding metadatagrant DOInot in production
3reference matchingbibliographic referenceDOIin production as part of the legacy CS system; matches available in the REST API
3title matchingjournal titleinternal Crossref journal IDin production as part of the legacy CS system; matches not available in the REST API

Additional reading and resources

Funder name matching

Funder name matching is used to map funder names to funder organisation identifiers. For example, “London Health Sciences Centre” can be mapped to https://ror.org/037tz0e16. Currently, we match against the Funder Registry. In the new matching service, matching will be done against the ROR Registry.

Preprint matching

Preprint matching is used to discover relationships between preprints and journal articles by matching journal metadata to preprint DOIs.

Background reading: Discovering relationships between preprints and journal articles

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships between preprints and journal articles discovered by matching within Crossref data: dataset

Affiliation matching

Affiliation matching is used to map affiliation strings to organisation identifiers. For example, “Department of Molecular Medicine, Sapporo Medical University, Sapporo 060-8556, Japan” can be mapped to https://ror.org/01h7cca57.

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships involving research organisations discovered by matching within Crossref data: dataset

Grant matching

Grant matching is used to discover funding relationships by matching funding information to grant DOIs.

Background reading:

A dataset with relationships between grants and research outputs discovered by matching within Crossref data: dataset

Citation matching

Citation matching is used to map bibliographic reference strings or metadata to DOIs. For example, “1. Boucher RC (2004) New concepts of the pathogenesis of cystic fibrosis lung disease. Eur Resp J 23: 146–158.” can be mapped to https://doi-org.turing.library.northwestern.edu/10.1183/09031936.03.00057003.

Background reading:

Recommended startegy for unstructured citation matching: code

Title matching

Title matching is used to map journal titles to Crossref’s internal journal records. We run title matching to map data in members’ deposits.

Page maintainer: Dominika Tkaczyk
Last updated: 2025-July-25