The integrity of the scholarly record is an essential aspect of research integrity. Every initiative and service that we have launched since our founding has been focused on documenting and clarifying the scholarly record in an open, machine-actionable and scalable form. All of this has been done to make it easier for the community to assess the trustworthiness of scholarly outputs. Now that the scholarly record itself has evolved beyond the published outputs at the end of the research process – to include both the elements of that process and its aftermath – preserving its integrity poses new challenges that we strive to meet… we are reaching out to the community to help inform these efforts.
I’m pleased to share the 2022 board election slate. Crossref’s Nominating Committee received 40 submissions from members worldwide to fill five open board seats.
We maintain a balance of eight large member seats and eight small member seats. A member’s size is determined based on the membership fee tier they pay. We look at how our total revenue is generated across the membership tiers and split it down the middle. Like last year, about half of our revenue came from members in the tiers $0 - $1,650, and the other half came from members in tiers $3,900 - $50,000.
Our entire community – members, metadata users, service providers, community organizations and researchers – create and/or use DOIs in some way so making them more accessible is a worthy and overdue effort.
For the first time in five years and only the second time ever, we are recommending some changes to our DOI display guidelines (the changes aren’t really for display but more on that below). We don’t take such changes lightly, because we know it means updating established workflows.
I’m delighted to say that Martin Paul Eve will be joining Crossref as a Principal R&D Developer starting in January 2023.
As a Professor of Literature, Technology, and Publishing at Birkbeck, University of London- Martin has always worked on issues relating to metadata and scholarly infrastructure. In joining the Crossref R&D group, Martin can focus full-time on helping us design and build a new generation of services and tools to help the research community navigate and make sense of the scholarly record.
We missed an error that led to resource resolution URLs of some 500,000+ records to be incorrectly updated. We have reverted the incorrect resolution URLs affected by this problem. And, we’re putting in place checks and changes in our processes to ensure this does not happen again.
How we got here
Our technical support team was contacted in late June by Wiley about updating resolution URLs for their content. It’s a common request of our technical support team, one meant to make the URL update process more efficient, but this was a particularly large request. Shortly thereafter, we were provided with nearly 1,200 separate files by Atypon on behalf of Wiley in order to update the resolution URLs of ~9 million records. We manually spot checked over 50 of these files, because, prior to this issue, our technical support team did not have a mechanism to automatically check for errors. That labor intensive review did not turn up any problems. That is, those 50 samples had no errors with the headers, like were found later.
Among the files we didn’t check, there were headers included in the files with different owning fromPrefix and acquiring toPrefix members’ DOI prefixes. In a URL update request, the prefixes should always be the same.
And still other files included requests to update records with DOIs that had never even been registered. Here are some examples:
In the example above, these fictional DOIs are both under prefix 10.5555. Thus, the result of this request will ONLY be that the resolution URLs of DOI 10.5555/doi1 and 10.5555/doi2 are updated in the metadata.
In this second example, these fictional DOIs are both under prefix 10.5555, but because the toPrefix in the header differs from the fromPrefix, the result of this request will be that the resolution URLs of 10.5555/doi1 and 10.5555/doi2 are updated in the metadata AND the owning prefix of both records will be transferred from prefix 10.5555 to prefix 10.9876.
We kicked off the URL update request on 30 June and all legitimate DOIs whose files were free of errors were updated by 7 July (yes, it takes about a week to update the resolution URLs for ~9 million records).
On 9 July, Peter Strickland of the International Union of Crystallography, one of 22 members affected by this mistake, contacted us to enquire how/why much of their content was resolving to incorrect URLs and why ownership of their content appeared within our search interface to be Wiley. Peter was rightly concerned. We were, too. Our technical support team quickly elevated this issue, because, frankly, this is not the first time our finicky URL update process has caused unwanted metadata updates, albeit not quite at this volume.
How we investigated the problem
We rallied our internal team. We investigated and discovered that we believed that some ~600,000 DOIs were erroneously included and updated in the requested 1,200 files. We later extended that estimate to include other conditions, in order to be as cautious as we could, to over 1 million DOIs. In the end, we determined that the incorrect files attempted updates of 1,228,041 DOIs. Due to the errors in the files (i.e., erroneous headers and non-registered DOIs), we only actually updated and then reverted 520,512 DOIs. The other 700,000+ DOIs were never updated (because of errors in the original files provided to us) or simply had never been registered with us.
Prior to this mistake, Crossref had never reverted a member’s metadata update before. To be clear, and as I said above, we have had other URL update mistakes over the years, like this one; they were just smaller in scale. We knew there were holes in our process that needed to be plugged. And we knew we needed a better solution for members to manage these updates themselves without our manual intervention. So, while there were mistakes made in the files supplied to us, this was our error and we’re fixing it; more on that below.
For this situation, we quickly realized that reversion of the metadata update was the best option for us, albeit we did not have an existing process in place to execute that reversion. That’s because we only keep the current version of each metadata record. We couldn’t back out of the change; we couldn’t simply restore these records to the metadata registered with us as of late June, because we no longer had an easily accessible, central record of those previous resolution URLs. What we did have was a record of all the previous submissions made against each DOI, so our technical team, focused their efforts there.
How we fixed all those records
We had two errors to correct: the ownership transfers (those records that had inadvertent and mismatched from/to prefixes) and the incorrect resolution URLs. We reverted all of the ownership transfers on 9 July and then double and triple checked that ownership during the week of 12 July to ensure we didn’t miss anything.
The resolution reversion was more complicated. We invested in creating a patch to identify the records that had been updated by our team, and then extract the last legitimate resolution URL registered with us by the owning member in order to revert the metadata for each record. In order to provide confidence that this mistake was contained, we also built a check into the patch to ensure that those DOIs that did have their ownership temporarily transferred were not updated during the few days that ownership was incorrect. That check helped us determine that none of the 520,512 DOIs were incorrectly updated beyond this mistaken URL update request.
The technical team built and tested this patch. The tests turned up gaps in the patch, so we refined it during the week of 2021 July 12. We kicked off the reversion of these records on Monday, 19 July at 20:05 UTC and the patch completed all reversions at 20:14 UTC, Thursday, 22 July.
In the end, we successfully reverted all of the resolution URLs for those 520,512 DOIs we identified; provided daily updates and apologies to the 22 affected members; together we worked some longer hours; and persevered.
We don’t want this to ever happen again. Like, never. We clearly need to make changes to our internal processes to prevent this in the future.
Here’s what’s ahead:
We are building a checker that we can run URL update files through to automate and our checks. This means we will be able to check every single file in a large batch, rather than relying on manual and labor intensive spot-checking;
As said above, one compounding issue in this mistake was the mismatched from/to prefixes in the file headers. Our technical support team uses the same file headers to transfer ownership/stewardship of a record or set of records between members AND to update resolution URLs. These two tasks are almost never legitimately completed in the same file. That is, there is usually a lag between ownership transfers and resolution URL updates (most members will request an ownership transfer and then a month or two later update their URLs). Because of this, simply decoupling these two tasks (feel free to follow our work at this link) would help eliminate a glaring risk, so we’re working on that too;
Lastly, we’re researching ways we can streamline resource resolution URL updates. You can also monitor our progress on this one. No promises or specifics yet, but we’re eager to reduce toil on our technical support team, avoid problems like this one, and provide members safe and straightforward ways to better update your metadata.
Thanks for the support of the whole Crossref team and our community - and for reading this far! Never a dull moment…