Blog

Metadata in PDF: 2. Use Cases

Tony Hammond

Tony Hammond – 2007 August 01

In Metadata

Well, this is likely to be a fairly brief post as I’m not aware of many use cases of metadata in PDFs from scholarly publishers. Certainly, I can say for Nature that we haven’t done much in this direction yet although are now beginning to look into this.

I’ll discuss a couple cases found in the wild but invite comment as to others’ practices. Let me start though with the CNRI handle plugin demo for Acrobat which I blogged here.

Handle Acrobat Reader Plugin

Tony Hammond

Tony Hammond – 2007 July 31

In Metadata

Just announced on the handle-info list is a new plugin from CNRI for Acrobat Reader - see here. The announcement says: _“It is intended to demonstrate the utility of embedding a identifying handle in a PDF document. … A set of demonstration documents, each with an embedded identifying handle, is packaged with the plug-in to show potential uses. To make productive use of this technology, a given industry or community of

XMP: First Hacks

Tony Hammond

Tony Hammond – 2007 July 27

In Metadata

<span >(<b>Update - 2007.07.28:</b> I meant to reference in this entry Pierre Lindenbaum’s post back in May <a href="http://plindenbaum.blogspot.com/2007/05/is-there-any-xmp-in-scientific-pdf-no.html">Is there any XMP in scientific pdf ? (No)</a>, which btw also references Roderic Page’s post on <a href="http://iphylo.blogspot.com/2007/05/xmp.html">XMP</a> but forgot to add in the links in my haste to scoot off. Well, truth is we still can’t answer Pierre in the affirmative but at least we can take the first steps towards rectifying this.)

<span >I’ve been revisiting Adobe’s <a href="http://www.adobe.com/products/xmp/">XMP</a> just recently. (I blogged <a href="/blog/xmp-capabilities-extended//">here</a> about the new <a href="http://www.adobe.com/devnet/xmp/">XMP Toolkit 4.1</a> back in March.)

<span >I wanted to share some of my early experiences. First off, after a couple of previous attempts which got pushed aside due to other projects, I managed to compile the libraries and the sample apps that ship with the C++ SDK under Xcode on the Mac. I also needed to compile <a href="https://libexpat.github.io/">Expat</a> first which doesn’t ship with the distribution.

<span >OK, so far, so good. What this basically leaves one with is a couple of XMP dump utilities (<i>DumpMainXMP</i> and <i>DumpScannedXMP</i>) and two others (<i>XMPCoreCoverage</i> and <i>XMPFilesCoverage</i>) which is a good start anyways for exploring. And turns out that our PDFs already have some workflow metadata in them. This is encouraging because the SDK allows apps to read and update existing XMP packets from files, though not to write new packets into files (as far as I understand).

<span >I thought I would take this opportunity anyway to:

  1. <span >See what XMP metadata terms we might consider adding
  • <span >Try and add these to existing XMP packets<span >Ugly details are presented below, but by updating the XMP packet metadata in one of our PDFs (<i>Nature 445, 37 (2007), C.J. Hogan</i>) we can teach Acrobat Reader to read - see the “before” (<a href="https://web.archive.org/web/20130815224916/http://nurture.nature.com.turing.library.northwestern.edu/">PDF here</a>) and “after” (<a href="https://web.archive.org/web/20130815224916/http://nurture.nature.com.turing.library.northwestern.edu/">PDF here</a>) screenshots in the figure.

    <span ><img src="/wp/blog/images/acrobats.png" alt="acrobats.png" width="583" height="466" />

    <span >Of course, this is really about much more than getting Adobe apps to read/write metadata. It’s about using XMP as a standard platform for embedding metadata in digital assets for <i>third-party apps</i> to read/write. If we can put ID3 tags into our podcasts then why not XMP packets into other media?</p>

IBM Article on PRISM

Tony Hammond

Tony Hammond – 2007 July 10

In Metadata

Nice entry article on PRISM here by Uche Ogbuji, Fourthought Inc. on IBM’s DeveloperWorks.

RSC’s Project Prospect v1.1

We updated our Project Prospect articles today to release v1.1, with a pile of look & feel improvements to the HTML views and links. The most interesting technical addition is the launch of our enhanced RSS feeds, where we have updated our existing feeds for enhanced articles. These now include ontology terms and primary compounds both visually (as text terms and 2D images) and within the RDF - using the OBO in OWL representation and the info:inchi specification mentioned here by Tony only a few weeks ago.

The enhanced entries will soon become more common as we concentrate our enhancements on our Advance Articles, but the current example below from our Photochemical and Photobiological Sciences feed is lovely. RDF code after the jump - just as beautiful to the parents…

ProspectRSS.jpg

XMP Capabilities Extended

Tony Hammond

Tony Hammond – 2007 March 22

In Metadata

This post on Adobe’s Creative Solutions PR blog may be worth a gander: _“This new update, the Adobe XMP 4.1, provides new libraries for developers to read, write and update XMP in popular image, document and video file formats including: JPEG, PSD, TIFF, AVI, WAV, MPEG, MP3, MOV, INDD, PS, EPS and PNG. In addition, the rewritten XMP 4.1 libraries have been optimized into two major components, the XMP Core and the XMP Files.

Use of PRISM in RSS

Tony Hammond

Tony Hammond – 2007 January 23

In Metadata

Was rooting around for some information and stumbled across this page which may be of interest: http://googlereader.blogspot.com/2006/08/namespaced-extensions-in-feeds.html Namespaced Extensions in Feeds Thursday, August 03, 2006 posted by Mihai Parparita “I wrote a small MapReduce program to go over our BigTable and get the top 50 namespaces based on the number of feeds that use them.” Seems quite an impressive percentage for PRISM.

AdsML

Tony Hammond

Tony Hammond – 2006 October 03

In Metadata

A new version of the AdsML Framework 2.0, Release 8 from the AdsML Consortium is now available for download from http://www.adsml.org/2006/announcements/adsml-framework-2-0-release-8-issued/.

Below is an extract from the “Vision” document which outlines the broad goals of AdsML.