2 minute read.
Metadata - For the Record
Interesting post here from Gunar Penikis of Adobe entitled “Permanent Metadata” (Oct. ’04). He talks about the the issues of embedding metadata in media and comes up with this:
“It may be the case that metadata in the file evolves to become a “cache of convenience” with the authoritative information living on a web service. The web service model is designed to provide the authentication and permissions needed. The link between the two provided by unique IDs. In fact, unique IDs are already created by Adobe applications and stored in the XMP - that is what the XMP Media Management properties are all about.”
An intriguing idea. Of course, Gunar’s (and Adobe’s) preoccupations with metadata revolve mainly around document workflow whereas, at least as things stand currently, scholarly publisher concerns are mainly with the dissemination of media in final form. Hence some differences in thinking:
- As just noted Adobe are more interested in workflow than in work. Scholarly articles are rich in descriptive metadata about the work itself and have a well-developed ctation model. Academic interest is in the intellectual content rather than the vehicle used to carry and preserve that content - the file format.
- Unique IDs
- Workflow IDs are UUIDs which identify specific instances and expressions, but do not identify the abstract work. UUIDs provide a unique identifier but there is no central registry for such identifiers, hence they cannot be “looked up”. Crossref publishers should be concerned to associate closely the DOI for the underlying work with a given media file. That’s the identifier that this community is actively promoting.
- Because of the focus on workflow, the XMP specification recommends that XMP packets be “writeable”, that is that they be marked as “writeable” and that they include padding whitespace which can accommodate updates without changing packet size. Publishers distributing final form documents are more likely to want to distribute “read-only” metadata which is authoritative and which describes the work, rather than the document format and workflow. Of course, this should not preclude additional sources of metadata which may be added “by reference” rather than “by value”. That is, a pointer to a web page (or service) may be sufficient to relate additional publisher terms and user annotations instead of embedding them directly in the file for various reasons: a) file integrity, b) limiting growth of file size, c) term authority, d) dynamic production (in forward time), and e) multiple sources.