In our previous blog post in this series, we explained why no metadata matching strategy can return perfect results. Thankfully, however, this does not mean that it’s impossible to know anything about the quality of matching. Indeed, we can (and should!) measure how close (or far) we are from achieving perfection with our matching. Read on to learn how this can be done!
How about we start with a quiz? Imagine a database of scholarly metadata that needs to be enriched with identifiers, such as ORCIDs or ROR IDs.
Weâre in year two of the Resourcing Crossref for Future Sustainability (RCFS) research. This report provides an update on progress to date, specifically on research weâve conducted to better understand the impact of our fees and possible changes.
Crossref is in a good financial position with our current fees, which havenât increased in 20 years. This project is seeking to future-proof our fees by:
Making fees more equitable Simplifying our complex fee schedule Rebalancing revenue sources In order to review all aspects of our fees, weâve planned five projects to look into specific aspects of our current fees that may need to change to achieve the goals above.
On behalf of the Nominating Committee, Iâm pleased to share the slate of candidates for the 2024 board election.
Each year we do an open call for board interest. This year, the Nominating Committee received 53 submissions from members worldwide to fill four open board seats.
We maintain a balanced board of 8 large member seats and 8 small member seats. Size is determined based on the organization’s membership tier (small members fall in the $0-$1,650 tiers and large members in the $3,900 - $50,000 tiers).
In our previous instalments of the blog series about matching (see part 1 and part 2), we explained what metadata matching is, why it is important and described its basic terminology. In this entry, we will discuss a few common beliefs about metadata matching that are often encountered when interacting with users, developers, integrators, and other stakeholders. Spoiler alert: we are calling them myths because these beliefs are not true! Read on to learn why.
In May, we updated you on the latest changes and improvements to the new version of iThenticate and let you know that a new similarity report and AI writing detection tool were on the horizon.
On Wednesday 1 November 2023, Turnitin (who produce iThenticate) will be releasing a brand new similarity report and a free preview to their AI writing detection tool in iThenticate v2. The AI writing detection tool will be enabled by default and account administrators will be able to switch it off/on.
Turnitin will be running a webinar on their new similarity report and AI writing detection tool on Tuesday 28 November (EDIT 23/11/16: Monday 11 December 2023). More information on the webinar and how to register will be communicated by Turnitin in the coming weeks.
New similarity report
On Wednesday, all iThenticate v2 users will have access to the new version of the similarity report which will include:
a word count and the number of text blocks for each matched source
the ability to include or exclude overlapping sources from the overall similarity score
a clearer colour differentiation between the different sources
improved accessibility features
Enabling the new similarity report
The new similarity report will be enabled as a default for all your journals. Account administrators wishing to switch off the new similarity report can do so by going to Settings and selecting from the General tab, under the New Similarity Report Experience heading, the Disable option.
Classic view / new view
As this will be a significant change to your current experience, Turnitin have provided access for a period of time to the âclassic viewâ and you will be able to toggle between the original interface and the new one by clicking on âSwitch to the classic viewâ or âSwitch to the new viewâ buttons at the top of your report.
The similarity score will continue to be available at the top right-hand corner of the similarity report.
Exclusions
By clicking on the Filters button youâll be able to check and/or adjust your reportâs section and repository exclusions.
Please note that the exclusions previously set up by account administrators should be unchanged by this release.
Sources / Match Groups view
The Sources view will be the default view and will list all sources. By using the on/off button next to âShow overlapping sourcesâ, youâll be able to include or exclude overlapping sources. This will be âoffâ as a default.
The Match Groups view is completely new and may not suit everyoneâs needs. It is divided into four categories âNot Cited or Quotedâ, âMissing Quotationsâ, âMissing Citationâ and âCited and Quotedâ and will highlight matches found in your text.
PDF report
Youâll also now find the PDF report in the top right-hand corner of the similarity report, by clicking on the âdownloadâ icon.
Submission details
âSubmission Detailsâ is located now under the âiâ icon in the top right-hand corner of your report. This is where you will find the oid (or unique number) for your manuscript which Turnitin will ask you to provide when you are reporting a technical issue.
Many of you have been concerned about the use of AI writing in the research papers youâve received since the launch of ChatGPT last November and have been in touch to enquire about the availability of an AI writing detection tool for Crossref members.
You will also have read that Turnitin have developed an AI writing detector tool and have made it available to their education sector customers since April. Turnitin have published an update in May, a helpful video and further information on the false positive rates in June based on the feedback theyâve received from the education community.
I am pleased to announce that Turnitinâs AI writing detection tool will be available as a free preview to iThenticate v2 users, via the new version of the similarity report, from Wednesday 1 November until the end of December 2023.
Enabling AI writing detection
Our preference was to have the new AI writing detection tool turned âoffâ as a default, however this hasnât been possible. Account administrators can turn this feature off by going to Settings and selecting the Crossref Web tab and scrolling down to the AI Writing section at the very bottom of the page. The feature is applied to all submissions when it is enabled.
Please note that AI Writing detection is only available in the new similarity report.
Integrations
There is currently no integration between manuscript tracking systems and the AI writing detection tool. However the AI score will be available via the similarity report. If the AI writing detection tool has been set as âoffâ by the account administrator, there will be no score and the âAI Writingâ heading will not be visible on the similarity report:
File requirements
Turnitin have made some important file requirements available for the tool to run a report:
Must be written in English
A minimum 300 words
A maximum of 15,000 words
The file size must be less than 100 MB
Accepted file types are .docx, .pdf, .rtf and .txt
If your file does not meet the above requirements, iThenticate v2 will display the following message:
Turnitinâs AI writing detection tool has been developed to detect GPT 3, 3.5, 4 and other variants. More information on this is available on their FAQs page.
Turnitin have provided the following guidance regarding the AI scores:
“Blue with a percentage between 0 and 100: The submission has processed successfully. The displayed percentage indicates the amount of qualifying text within the submission that Turnitinâs AI writing detection model determines was generated by AI. As noted previously, this percentage is not necessarily the percentage of the entire submission. If text within the submission was not considered long-form prose text, it will not be included.
Our testing has found that there is a higher incidence of false positives when the percentage is between 1 and 20. In order to reduce the likelihood of misinterpretation, the AI indicator will display an asterisk (*) for percentages between 1 and 20 to call attention to the fact that the score is less reliable.
To explore the results of the AI writing detection capabilities, select the indicator to open the AI writing report. The AI writing report opens in a new tab of the window used to launch the Similarity Report. If you have a pop-up blocker installed, ensure it allows Turnitin pop-ups.”
Please note that unlike the similarity report, the AI writing report will only provide a score and highlight the blocks of texts likely to have been written by an AI tool and will not list source matches.
We encourage you to test the writing detection tool as much as possible during the free preview period (1 November-31 December 2023).
Next
Paraphrase detection
Turnitin are planning to release a beta version of their new paraphrase detection tool at the end of this year/Q1, 2024. It will be initially available as a free preview for a short period of time. (EDIT 23/11/16: There is currently no timeline available for Turnitin’s paraphrase detection tool which is having a knock-on effect on the availiblity of the AI writing and paraphrase detection bundle and associated fees previously mentioned in this post)
AI and paraphrase detection bundle (EDIT 23/11/16: AI writing detection tool)
Once the free preview period ends, Turnitin would like to offer Crossref members an AI and paraphrase detection bundle (EDIT 23/11/16: are planning to make their AI writing detection tool available) from 2024 - this means that if you choose to subscribe to this new service, you will be charged an additional fee each time you upload a manuscript.
Fixes
Many of you have been waiting for fixes to the aggregation of URLs issues in the matched sources of the similarity report and to the doc-to-doc PDF report in iThenticate v2. Turnitin are planning to release fixes for these before the end of 2023.
âď¸ Do get in touch via support@crossref.org if you have any questions about iThenticate v1 or v2 or start a discussion by commenting on this post below or in our Community Forum.