Dominika Tkaczyk

What if I told you that bibliographic references can be structured?

Dominika Tkaczyk – 2019 July 08

In LinkingCitationR&DReference MatchingMetadata Matching

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

Reference matching: for real this time

Dominika Tkaczyk – 2018 December 18

In LinkingCitationR&DReference MatchingMetadata Matching

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Matchmaker, matchmaker, make me a match

Dominika Tkaczyk – 2018 November 12

In LinkingCitationR&DReference MatchingMetadata Matching

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?

What does the sample say?

Dominika Tkaczyk – 2018 November 09

In LinkingCitationR&DReference Matching

At Crossref Labs, we often come across interesting research questions and try to answer them by analyzing our data. Depending on the nature of the experiment, processing over 100M records might be time-consuming or even impossible. In those dark moments we turn to sampling and statistical tools. But what can we infer from only a sample of the data?

RSS feed

Recent blog posts

Schema 5.5 now available: adding CRediT, new record types for blogs and posters, and more

2026 July 09

Take part in UX Research at Crossref

2026 July 02

Building, refining, and connecting: summary of our May 2026 community update

2026 June 30

From commitment to connection: 200,000 grants in the scholarly record

2026 June 26

Get involved

Find a service

Documentation

About us

2026 July 09

Schema 5.5 now available: adding CRediT, new record types for blogs and posters, and more

2026 July 02

Take part in UX Research at Crossref

2026 June 30

Building, refining, and connecting: summary of our May 2026 community update

2026 June 26

From commitment to connection: 200,000 grants in the scholarly record

Blog