Dominika Tkaczyk

Dominika Tkaczyk

Head of Strategic Initiatives


Dominika joined Crossref’s R&D group in the Tech team in August 2018. Her research interests focus on machine learning and natural language processing, in particular their applications to the automated analysis of scientific literature and research outputs. Previously, she has worked on a number of projects, including the extraction of machine-readable metadata from scholarly documents, predicting people’s demographic features based on their internet browsing history, and developing new metrics for assessing the effectiveness of worldwide air traffic. Dominika’s career started in Poland, where she was a researcher and a data scientist at the University of Warsaw. She received a PhD in Computer Science from the Polish Academy of Sciences in 2016. In 2017 Dominika was awarded a Marie Sklodowska-Curie EDGE Fellowship and moved to Ireland to work as a postdoctoral researcher at Trinity College Dublin. When not busy training yet another random forest or neural network, you can find her at the nearest Doctor Who convention or rock/metal concert.

Dominika Tkaczyk's Latest Blog Posts

The anatomy of metadata matching

Dominika Tkaczyk, Thursday, Jun 27, 2024

In MetadataLinkingMetadata Matching

Leave a comment

In our previous blog post about metadata matching, we discussed what it is and why we need it (tl;dr: to discover more relationships within the scholarly record). Here, we will describe some basic matching-related terminology and the components of a matching process. We will also pose some typical product questions to consider when developing or integrating matching solutions. Basic terminology Metadata matching is a high-level concept, with many different problems falling into this category.

Metadata matching 101: what is it and why do we need it?

Dominika Tkaczyk, Thursday, May 16, 2024

In MetadataLinkingMetadata Matching

Leave a comment

At Crossref and ROR, we develop and run processes that match metadata at scale, creating relationships between millions of entities in the scholarly record. Over the last few years, we’ve spent a lot of time diving into details about metadata matching strategies, evaluation, and integration. It is quite possibly our favourite thing to talk and write about! But sometimes it is good to step back and look at the problem from a wider perspective.

Discovering relationships between preprints and journal articles

Dominika Tkaczyk, Thursday, Dec 7, 2023

In PreprintsLinking

Leave a comment

In the scholarly communications environment, the evolution of a journal article can be traced by the relationships it has with its preprints. Those preprint–journal article relationships are an important component of the research nexus. Some of those relationships are provided by Crossref members (including publishers, universities, research groups, funders, etc.) when they deposit metadata with Crossref, but we know that a significant number of them are missing. To fill this gap, we developed a new automated strategy for discovering relationships between preprints and journal articles and applied it to all the preprints in the Crossref database. We made the resulting dataset, containing both publisher-asserted and automatically discovered relationships, publicly available for anyone to analyse.

The more the merrier, or how more registered grants means more relationships with outputs

Dominika Tkaczyk, Wednesday, Feb 22, 2023

In GrantsResearch Funders

Leave a comment

One of the main motivators for funders registering grants with Crossref is to simplify the process of research reporting with more automatic matching of research outputs to specific awards. In March 2022, we developed a simple approach for linking grants to research outputs and analysed how many such relationships could be established. In January 2023, we repeated this analysis to see how the situation changed within ten months. Interested? Read on!

Follow the money, or how to link grants to research outputs

Dominika Tkaczyk, Tuesday, Mar 22, 2022

In GrantsLinkingCrossref Labs

Leave a comment

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, a paper cites a book, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact. No wonder that finding and exposing such relationships is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Lately, another instance has been drawing more and more attention: linking research outputs with grants used as their funding source. How can this be done and how many such links can we observe?

Read all of Dominika Tkaczyk's posts »