Metadata matching

Crossref is focused on enriching metadata to highlight relationships among works, individuals, institutions, and actions. Metadata matching is the task of finding an identifier for an item based on a structured or unstructured “description” of it. It’s a key part to our enrichment process and one way to address gaps in the scholarly record, while keeping barriers to membership and participation as low as possible – to enable inclusivity of the research nexus.

Some examples of matching tasks include:

Finding a DOI for a cited article based on a citation string.
Finding the ROR ID for an organisation based on an affiliation string.
Finding the ORCID ID for a researcher based on the person’s name and affiliation.
Finding the grant DOI based on an award number and a funder name.

Metadata matching gives a more complete picture of the research nexus. It discovers missing relationships between entities throughout the scholarly record. In many cases, these relationships are already included in metadata records, but in others we can carry out automated matching to identify the related entity.

Read more about metadata matching in the blog series:

In April 2025, we launched the matching project, which is a major effort to rebuild Crossref’s metadata matching workflows using modern software development and data science practices. The goal is to create a dedicated, consolidated matching workflow that will eventually replace all existing and future production matching processes, with results made available through the REST API. This project covers six matching tasks: bibliographic reference matching, funder name matching, preprint matching, affiliation matching, grant matching, and title matching.

project phase	matching task	input	target identifier	status
1	funder matching	funder organisation name	ROR ID	in production as part of the legacy CS system; matches available in the REST API
2	preprint matching	journal article metadata	preprint DOI	in production as part of the legacy CS system; matches not available in the REST API
2	affiliation matching	affiliation string	ROR ID	not in production
2	grant matching	funding metadata	grant DOI	not in production
3	reference matching	bibliographic reference	DOI	in production as part of the legacy CS system; matches available in the REST API
3	title matching	journal title	internal Crossref journal ID	in production as part of the legacy CS system; matches not available in the REST API

Additional reading and resources

Funder name matching

Funder name matching is used to map funder names to funder organisation identifiers. For example, “London Health Sciences Centre” can be mapped to https://ror.org/037tz0e16. Currently, we match against the Funder Registry. In the new matching service, matching will be done against the ROR Registry.

Preprint matching

Preprint matching is used to discover relationships between preprints and journal articles by matching journal metadata to preprint DOIs.

Background reading: Discovering relationships between preprints and journal articles

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships between preprints and journal articles discovered by matching within Crossref data: dataset

Affiliation matching

Affiliation matching is used to map affiliation strings to organisation identifiers. For example, “Department of Molecular Medicine, Sapporo Medical University, Sapporo 060-8556, Japan” can be mapped to https://ror.org/01h7cca57.

Recommended strategy: code

A ground truth evaluation dataset: dataset

A dataset with relationships involving research organisations discovered by matching within Crossref data: dataset

Grant matching

Grant matching is used to discover funding relationships by matching funding information to grant DOIs.

Background reading:

A dataset with relationships between grants and research outputs discovered by matching within Crossref data: dataset

Citation matching

Citation matching is used to map bibliographic reference strings or metadata to DOIs. For example, “1. Boucher RC (2004) New concepts of the pathogenesis of cystic fibrosis lung disease. Eur Resp J 23: 146–158.” can be mapped to https://doi-org.turing.library.northwestern.edu/10.1183/09031936.03.00057003.

Background reading:

Recommended startegy for unstructured citation matching: code

Title matching

Title matching is used to map journal titles to Crossref’s internal journal records. We run title matching to map data in members’ deposits.

Get involved

Find a service

Documentation

About us

2025 December 18

Highlights of a very busy year: our 2025 annual report

2025 December 17

Twenty-five years of Crossref: reflections from the 2025 annual meeting and board election

2025 December 15

Wellcome and Europe PMC: supporting Open Research through open metadata

2025 December 11

Some things are big because they are small – the new fee tier for Crossref members takes effect