5 minute read.
Don’t take it from us: Funder metadata matters
Why the focus on funding information?
We are often asked who uses Crossref metadata and for what. One common use case is researchers in bibliometrics and scientometrics (among other fields) doing meta analyses on the entire corpus of records. As we pass the 10 year mark for the Funder Registry and 5 years of funders joining Crossref as members to register their grants, it’s worth a look at some recent research that focuses specifically on funding information. After all, there is funding behind so much scholarly work it seems obvious that it would be routinely documented in the scholarly record. But it often isn’t and that’s a problem. These sources make clear the need for accurate funding information and the problems that the lack of it creates.
First, a few notes for context on these sources and the issues they discuss :
- The percent of records with funding information reached about 25% as of 2021. Not all items registered are the result of funding but surely it is much higher than 25% so there is considerable room for improvement. The authors cite publishers that omit funding information as well as those that include it routinely. Overall, society publishers are at the top of the list of those that do it well.
- Three of the four sources found problems in some cases confirming funding information from the metadata in the original sources. This initially surprised me though less so once I thought about the strange nature of metadata workflows.
- The complexity of fully and correctly acknowledging multiple sources of funding in any given publication is a recurring theme.
- All of the sources mention the need for manual work in analyzing funding and publication information.
The first two papers are from the same 2022 issue of Quantitative Science Studies and are complementary.
Alexis-Michel Mugabushaka, Nees Jan van Eck, Ludo Waltman; Funding COVID-19 research: Insights from an exploratory analysis using open data infrastructures.
Quantitative Science Studies 2022; 3 (3): 560–582. doi: https://doi-org.turing.library.northwestern.edu/10.1162/qss_a_00212
This first paper tackles the timely question of determining which funders have supported publications of COVID-19 research and compares coverage of funding data in Crossref to that in Scopus and Web of Science. Even with so much urgent attention focused on the pandemic, the authors found that only 17% of publications in the COVID-focused CORD-19 database have funding identified in their Crossref records.
We’re often asked about differences in the metadata (and citation counts) between Crossref and other sources such as Scopus. In this case, both proprietary sources studied have more funder coverage.
If you are disappointed in these results or want to learn more, I encourage you to read the authors’ recommendations for improving funding data in Crossref or get in touch with us.
Bianca Kramer, Hans de Jonge; The availability and completeness of open funder metadata: Case study for publications funded by the Dutch Research Council. Quantitative Science Studies 2022; 3 (3): 583–599. doi: https://doi-org.turing.library.northwestern.edu/10.1162/qss_a_00210
This next paper focuses on a set of outputs funded by the NWO (the Dutch Research Council). Since the funder is already known, the authors could look at multiple sources (Crossref and others) to see whether or where the NWO is correctly identified as the funder. This study also found better coverage than Crossref in proprietary sources like Web of Science. Knowing that not all outputs are the result of funded research, this paper provides a new and useful baseline for comparing percentages of coverage.
Discussions of research funding so often focus on the physical and life sciences so it’s very good to see that 37% of works in this study are in the humanities and social sciences.
Borst, T., Mielck, J., Nannt, M., Riese, W. (2022). Extracting Funder Information from Scientific Papers - Experiences with Question Answering. In: , et al. Linking Theory and Practice of Digital Libraries. TPDL 2022. Lecture Notes in Computer Science, vol 13541. Springer, Cham. https://doi-org.turing.library.northwestern.edu/10.1007/978-3-031-16802-4_24
Given the considerable effort required to conduct these analyses, it’s only logical to consider automating as much of the work as possible. This next paper focuses on automatic recognition of funders in economics papers in digital libraries.
An interesting complication described here is the inclusion of funding for open access fees in acknowledgments and while the authors conclude that automated text mining of funder information performs better than manual curation, they also state that manual indexing is still necessary “for a gold standard of reliable metadata.”
Habermann, T. (2022). Funder Metadata: Identifiers and Award Numbers. https://metadatagamechangers.com/blog/2022/2/2/funder-metadata-identifiers-and-award-numbers
Finally, this concise blog post looks at RORs as well as funder names and acronyms. The author shows how acronyms contribute to the need for manual analysis. He also spends some time on award numbers, which is one of the three funding elements publishers can (and, as we’ve seen, should) include in their metadata. Award numbers are also a focus of this work and, unfortunately, another frequent reason for additional manual work.
Though collectively, this research paints a fairly dim picture of the current availability, completeness and accuracy of existing funding information in publication metadata, all is not lost. This is a good opportunity to point out the value and availability of grant records since unique, persistent identifiers for grants (yes, DOIs for grants) paired with more and better funding metadata from publishers go a very long way to realizing the vision of the Research Nexus. And it certainly would make things a whole lot easier for the researchers who use this open metadata to analyze the scholarly record for the rest of us.