To address the growing scale and complexity of scholarly data, we’ve launched a new data science function at Crossref. In April, we were excited to welcome our first data scientists, Jason Portenoy and Alex Bédard-Vallée, to the team. With their arrival, the Data Science team is now fully up and running. In this blog post, we’re sharing our vision and what’s ahead for data science at Crossref.
If you are reading this blog on our website, you may have noticed that alongside each post we now list a Crossref DOI link, which was not the case a few months ago (though we have retroactively added DOIs to all older posts too). You can find the persistent link for this post right above this paragraph. Go on, click on it, we’ll wait.
If you take a peek at our blog, you’ll notice that metadata and community are the most frequently used categories. This is not a coincidence – community is central to everything we do at Crossref. Our first-ever Metadata Sprint was a natural step in strengthening both. Cue fanfare!. And what better way of celebrating 25 years of Crossref?
We designed the Crossref Metadata Sprint as a relatively short event where people can form teams and tackle short problems. What kind of problems? While we expected many to involve coding, teams also explored documenting, translating, researching—anything that taps into our open, member-curated metadata. Our motivation behind this format was to create a space for networking, collaboration, and feedback, centered on co-creation using the scholarly metadata from our REST API, the Public Data File, and other sources.
A Schematron report tells you if there’s a metadata quality issue with your records.
Schematron is a pattern-based XML validation language. We try to stop the deposit of metadata with obvious issues, but we can’t catch everything because publication practices are so varied. For example, most family names in our database that end with jr are the result of a publisher including a suffix (Jr) in a family name, but there are of course surnames ending with ‘jr’.
We do a weekly post-registration metadata quality check on all journal, book, and conference proceedings submissions, and record the results in the schematron report. If we spot a problem we’ll alert your technical contact via email. Any identified errors may affect overall metadata quality and negatively affect queries for your content. Errors are aggregated and sent out weekly via email in the schematron report.
What should I do with my schematron report?
The report contains links (organized by title) to .xml files containing error details. The XML files can be downloaded and processed programmatically, or viewed in a web browser: