ISR part three: Where does Crossref have the most impact on helping the community to assess the trustworthiness of the scholarly record?

8 minute read.

ISR part three: Where does Crossref have the most impact on helping the community to assess the trustworthiness of the scholarly record?

Rachael Lammey – 2022 October 17

In Research IntegrityTrustworthinessProduct

Ans: metadata and services are all underpinned by POSI.

Leading into a blog post with a question always makes my brain jump ahead to answer that question with the simplest answer possible. I was a nightmare English Literature student. ‘Was Macbeth purely a villain?’ ‘No’. *leaves exam*

Just like not giving one-word answers to exam questions, playing our role in the integrity of the scholarly record and helping our members enhance theirs takes thought, explanation, transparency, and work.

Some of the elements Amanda outlines in the previous posts in this series (Part 1, Part 2) really resonated from a product perspective:

We must be cautious that our best practices for demonstrating legitimacy and identifying deceptive behaviour do not raise already-high barriers for emerging publications or organizations that present themselves in ways that some may not recognize as professional standards. Disruption is different from deception. Crossref has an opportunity to think about how to identify deceptive actions and pair that with our efforts to bring more people on board and support their full participation in our ecosystem.

We don’t have the means or desire to be the arbiter of research quality (whatever that means). However, we operate neutrally, at the center of scholarly communications, and we can help develop a shared consensus or framework. Our metadata elements and tools can be positioned to signal or detect trustworthiness. An important distinction is that we can play a role in assessing legitimacy (activities of the actors) but not in quality (calibre of the content itself).

Crossref has lots of plans (and lots to do) to improve our role in ISR

Rather than a long list of things we want to do in terms of tools, services, and functionality, it feels more manageable to break this work into three key areas.

1. Collecting better information in better ways

We think many elements of the metadata our members record with us help expose important information about the research, e.g., authors, publication dates, and abstracts. We also help our members assess submissions for originality via our Similarity Check service, and the ongoing migration to iThenticate V2 aims to better support this aspect of the publication process.

Beyond this, as Amanda points out, ‘once members start registering their content, their metadata speaks about their practices’. Seeing who published a work along with the metadata they provide; validated ORCID IDs to identify the authors, reference lists and links to related research and data, and important updates to the work via Crossmark, all contribute to showing not just the ‘what’ but the ‘how’ so that the community can use that information to support their decision-making.

I always want to stress that this work is not just an ‘ask’ for our members. We are moving in the same direction as we improve the things we do to support organizations in registering their records with us, answering their questions, working with partner organizations like PKP, consulting with our community on pain points, and thinking about how we can better enhance and facilitate their work. We’ve been fortunate that our community has taken the time to engage in discussions with Turnitin on iThenticate improvements, do user testing sessions as we build simple user interfaces to record grants, lead calls and conversations on improving grant metadata and supporting the uptake of ROR and data citation, and provide thoughtful feedback on our recent preprint on CRE metadata. This all helps us to explain, structure, and prioritize our product work.

There are also some closely related R&D-led projects that are already informing our thinking:

A more responsive version of participation reports so that it’s easier for members to identify gaps in their metadata and compare against others.
Making it easier to get metadata back in a format where members can easily redeposit it.
Better matching to help us and our members augment the metadata they send us to add value to the work we all do.

We said in the previous blog posts that we’ll pose questions about what kinds of metadata give what kind of levels of trustworthiness, and have previously highlighted the following activities:

Reporting corrections and retractions through Crossmark metadata. We know that our members are collecting this information, but often it isn’t making it through metadata workflows to us. We’re part of the NISO CREC (Communication of Retractions, Removals, and Expressions of Concern) working group with many of our members and metadata users, as this feels like something critical to address.
Assessing originality using Similarity Check. On average, we’re seeing 320 new Similarity Check subscribers each year, with over 10 million checks being done each year by our members.
Establishing provenance and stakeholders through ORCID and ROR. At the time of writing, we have over 30,000 ROR IDs in Crossref, and this is growing steadily across different record types. ROR is keen to support adoption and so are we.
Acknowledging funding and other support through the use of the Open Funder Registry and registering grants metadata. This has improved in quality and completeness since we launched the Funder Registry in 2014 and with more comprehensive support for grants in more recent years. But we still have work to do, as this paper by Kramer and de Jonge points out: The availability and completeness of open funder metadata.
Citing data for transparency and reproducibility, including linking to related research data. Scholix, MDC and STM Research Data groups.
Demonstrating open peer review by registering peer review reports. Members have already recorded over 300,000 peer reviews with Crossref, opening up this information on their processes.

In your organization, what weight do you give these? We know that some of our members register some of these things in more volume than others - is that due to their perceived value, technical limitations, or ‘we’re working on it, give us time?’ Do you think of them in the context of the integrity of the record or are we off the mark? Are there other things we haven’t mentioned in this blog that we could capture, report on and highlight?

2. Disseminating this information and supporting its downstream use

We want to make it as easy as possible for everyone to access and use the metadata our members register with us. Especially as some of the biggest metadata users are our members and, more selfishly, us! But there’s no point collecting metadata to support ISR if it’s unwieldy and difficult to access and use.

We’re working on a project, described in the mid-year community update by a number of my colleagues to break down internal metadata silos and model it in a more flexible way. This will lend itself to better information collection and exchange, and support of the Research Nexus by building a relationships API to let anyone see all of the relationships Crossref can see between a given work and well, anything else related to it (citations, links to preprints, links to data to name but a few).

Part of that work will involve supplementing the metadata our members register with high-quality, curated data from selected sources, making it clear where those assertions have come from.

We want our API to perform consistently and well, to contain all the metadata our members register, handle it appropriately, and be able to keep the information in it up-to-date.

Our API will underpin the reports we provide our members (among other things) so that we can provide simple interfaces for organizations to check how they’re doing along with more functional requests. Do their DOIs resolve? Are they submitting metadata updates when they publish a correction? How much will they be billed in a given quarter? We have a lot of internal reporting and need to build more, and if we want to use these, chances are many others do too, so we should open those up.

3. Trying to live up to POSI to underpin this work

When I see a new project, initiative, tool or service in the research ecosystem the first thing I want to do is find out about the organization itself so that I can base some decisions on that. Lateral reading in action.

At Crossref, we want to show who we are beyond just our tools, services, and products and be transparent about our values. That’s why we have adopted the Principles of Open Scholarly Infrastructure or POSI for short. Now we need to meet these principles and we’re working towards that. POSI proposes three areas that an Open Infrastructure organization like Crossref can address to garner the trust of the broader scholarly community: accountability (governance), funding (sustainability), and protection of community interests (insurance). POSI also proposes a set of concrete commitments that an organization can make to build community trust in each area.

So POSI isn’t just opening code and metadata, it’s telling our community how we handle membership, governance, product development, technical and financial stability and security, holding our hands up when we’ve got something wrong, and actively looking to improve upon the things we do.

Are you still reading? If so, you’ve done better than many of my examiners, I’m sure. So stay with us as we work together to ensure we bring quality, transparency, and integrity to the work we all do.

The next part in this series will report back on the feedback and discussions and potentially propose some new or adjusted priorities. Join us at the Frankfurt bookfair this week (hall 4.2, booth M5) or comment on this post below.

Get involved

Find a service

Documentation

About us

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

2024 March 15

Credential Checking at Crossref

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Blog