Blog

 3 minute read.

Subject codes, incomplete and unreliable, have got to go

Patrick Polischuk

Patrick Polischuk – 2024 March 13

In MetadataAPIs

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated. dfdfd

The subject metadata element was born out of a Labs experiment intended to enrich the metadata returned via Crossref Metadata Search with All Subject Journal Classification codes from Scopus. This feature was developed when the REST API was still fairly new, and we now recognize that the initial implementation worked its way into the service prematurely.

While subject classifications in Crossref metadata could be very useful, the current implementation in the REST API is problematic for three primary reasons:

They are misleadingly exposed in the API as a property of the work, when in fact they are a property of the container (e.g. a journal or conference proceeding). Just because a journal’s broad topic category is “X” doesn’t mean that a particular article in the journal is about “X.”

Existing works may have outdated subjects. Originally, subject codes were not updated periodically. However, subjects exposed in the /journals route are now updated once a day. Those exposed via the /works endpoint are indexed along with works, and so when a new subject list is ingested, new DOIs start getting new subjects, but existing works may have outdated subjects. We don’t have a mechanism for forcing updates when incorrect subject values are returned via the REST API, so this data can be stale and incorrect.

They are not applied to everything. This is because the Scopus list does not cover all the journals that Crossref has (conversely, the Scopus list contains some journals Crossref does not have), and does not contain other container types.

The Labs team investigated options for improving subject classification coverage but ultimately concluded that there are insufficient solutions to the coverage problem. For more, please see Esha Datta’s findings published at Force11’s Upstream: https://doi-org.turing.library.northwestern.edu/10.54900/n6dnt-xpq48

Where does that leave us? Rather than continuing to supply unreliable and misleading subject category metadata, we will be deprecating this feature in the coming weeks. To minimize disruption and avoid breaking changes at this time, we will be removing this data from our index, so the subject element will simply be empty. We may remove the subject element in the future.

We know that the community’s desire for subject-based analysis of metadata is very strong, and we have supported efforts to establish a multidisciplinary taxonomy. Inaccurate codes in the meantime do not help but actually hinder these efforts, giving the false impression that they are correct.

We aim to deprecate the subject codes in April of this year.

Please let us know if you have any questions or concerns by leaving a comment below, which will start a thread in our community forum.

Frequently asked questions

Q. Will the subject field continue to be available and functional?
A. The subject metadata element will continue to be included in the JSON response but will not return any values.

Q. Will new subject codes be added in the future?
A. We do not have any current plans to add new subject codes in the future.

Q. I received a notification about this, but we don’t use subject codes. Do I need to do anything?
A. No, if you do not currently use the subject element, you do not need to do anything about this change.

Q. I noticed that wrong or inaccurate subject codes were assigned to my works. Is this a solution?
A. Yes. Until we can identify an accurate and sustainable system for assigning subject codes to Crossref metadata records, we want to stop assigning inaccurate subject codes and remove all existing assignments.

Related pages and blog posts

Page owner: Patrick Polischuk   |   Last updated 2024-March-13