Latest blog posts

2024 April 03

Testing times

One of the challenges that we face in Labs and Research at Crossref is that, as we prototype various tools, we need the community to be able to test them. Often, this involves asking for deposit to a different endpoint or changing the way that a platform works to incorporate a prototype. The problem is that our community is hugely varied in its technical capacity and level of ability when it comes to modifying their platform.

...Find out more

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

When each line of code is written it is surrounded by a sea of context: who in the community this is for, what problem we’re trying to solve, what technical assumptions we’re making, what we already tried but didn’t work, how much coffee we’ve had today. All of these have an effect on the software we write. By the time the next person looks at that code, some of that context will have evaporated.

...Find out more

2024 March 15

Credential Checking at Crossref

It turns out that one of the things that is really difficult at Crossref is checking whether a set of Crossref credentials has permission to act on a specific DOI prefix. This is the result of many legacy systems storing various mappings in various different software components, from our Content System through to our CRM. To this end, I wrote a basic application, credcheck, that will allow you to test a Crossref credential against an API.

...Find out more

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Subject classifications have been available via the REST API for many years but have not been complete or reliable from the start and will soon be deprecated. dfdfd The subject metadata element was born out of a Labs experiment intended to enrich the metadata returned via Crossref Metadata Search with All Subject Journal Classification codes from Scopus. This feature was developed when the REST API was still fairly new, and we now recognize that the initial implementation worked its way into the service prematurely.

...Find out more

Blog

2 minute read.

And the DOI is …

Tony Hammond – 2008 December 22

Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility.

Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip‘ file for Windows (no Perl required) or ‘.dmg‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years. (And when I say actively I mean just that. I once made the mistake of printing out the change file.)

If Perl’s not your thing, then there’s a Ruby wrapper gem (MiniExiftool) to access the Exiftool command in trouper OO fashion. Here’s an example Ruby one-liner to get the DOI from a PDF (broken here to meet column width restriction):

% ruby -rubygems -e 'require "mini_exiftool";<br />     puts MiniExiftool.new("test.pdf")["doi"]'<br /> 10.1038/nphoton.2008.200

Of course, that could also have been run against an image, audio or video file with XMP packet.

(Makes one wonder vaguely about the feasibility of having a Swiss Army knife type of utility that could read any file to get the DOI using the embedded XMP, RDFa, RDF, HTML headers, COiNS, etc. Possibly even as last resort fall back to scanning the raw text - if any.)

Related pages and blog posts

Recent Posts

2024 April 03

Mending Chesterton's Fence: Open Source Decision-making

2024 March 18

Credential Checking at Crossref

2024 March 15

Subject codes, incomplete and unreliable, have got to go

2024 March 13

Categories

Archives

2024 (14)
2023 (31)
2022 (31)
2021 (25)
2020 (34)
2019 (37)
2018 (55)
2017 (41)
2016 (50)
2015 (23)
2014 (4)
2013 (2)
2012 (5)
2011 (6)
2010 (9)
2009 (34)
2008 (56)
2007 (91)
2006 (21)

Page owner: Tony Hammond | Last updated 2008-December-22