Blog

Citation Typing Ontology

I was happy to read David Shotton’s recent Learned Publishing article, Semantic Publishing: The Coming Revolution in scientific journal publishing, and see that he and his team have drafted a Citation Typing Ontology.*

Anybody who has seen me speak at conferences knows that I often like to proselytize about the concept of the “typed link”, a notion that hypertext pioneer, Randy Trigg, discussed extensively in his 1983 Ph.D. thesis.. Basically, Trigg points out something that should be fairly obvious- a citation (i.e. “a link”) is not always a “vote” in favor of the thing being cited.
In fact, there are all sorts of reasons that an author might want to cite something. They might be elaborating on the item cited, they might be critiquing the item cited, they might even be trying to refute the item cited (For an exhaustive and entertaining survey of the use and abuse of citations in the humanities, Anthony Grafton‘s, The Footnote: A Curious History, is a rich source of examples)
Unfortunately, the naive assumption that a citation is tantamount to a vote of confidence has become inshrined in everything from the way in which we measure scholarly reputation, to the way in which we fund universities and the way in which search engines rank their results. The distorting affect of this assumption is profound. If nothing else, it leads to a perverse situation in which people will often discuss books, articles, and blog postings that they disagree with without actually citing the relevant content, just so that they can avoid inadvertently conferring “wuffie” on the item being discussed. This can’t be right.
Having said that, there has been a half-hearted attempt to introduce a gross level of link typology with the introduction of the “nofollow” link attribute- an initiative started by Google in order to try to address the increasing problem of “Spamdexing”. But this is a pretty ham-fisted form of link typing- particularly in the way it is implemented by the Wikipedia where Crossref DOI links to formally published scholarly literature have a “nofollow” attribute attached to them but, inexplicably, items with a PMID are not so hobbled (view the HTML source of this page, for example). Essentially, this means that, the Wikipedia is a black-hole of reputation. That is, it absorbs reputation (through links too the Wikipedia), but it doesn’t let reputation back out again. Hell, I feel dirty for even linking to it here ;-).
Anyway, scholarly publishers should certainly read Shotton’s article because it is full of good, and practical ideas about what can can be done with today’s technology in order to help us move beyond the “digital incunabula” that the industry is currently churning out. The sample semantic article that Shotton’s team created is inspirational and I particularly encourage people to look at the source file for the ontology-enhanced bibliography which reveals just how much more useful metadata can be associated with the humble citation.
And now I wonder whether CiteULike, Connotea, 2Collab or Zotero will consider adding support for the CItation Typing Ontology into their respective services?
* Disclosure:
a) I am on the editorial board of Learned Publishing
b) Crossref has consulted with David Shotton on the subject of semantically enhancing journal articles

Researcher Identification Primer

Geoffrey Bilder

Geoffrey Bilder – 2009 March 11

In ORCID

Discussions around “contributor Ids” (aka “Author ID, Researcher ID, etc.) seem to be becoming quite popular. In the interview that I pointed to in my last post, I mentioned that Crossref has been talking with a group of researchers who were very interested in creating some sort of authenticated contributor ID as a mechanism for controlling who gets trusted access to sensitive genome-wide aggregate genotype data.

Well, I’m delighted to say that said group of researchers(at the GEN2PHEN project) have created a “Researcher Identification Primer” website in which they outline the many use-cases and issues around creating a mechanism for unambiguously identifying and/or authenticating researchers. This looks like a great resource and I expect it will serve as a useful focus for further discussion around the issue.

An interview about “Author IDs”

Geoffrey Bilder

Geoffrey Bilder – 2009 February 19

In Identifiers

Over the past few months there seems to have been a sharp upturn in general interest around implementing an “author identifier” system for the scholarly community. This, in turn, has meant that more people have been getting in touch with us about our nascent “Contributor ID” project. The other day, after seeing my comments in the above thread, Martin Fenner asked if he could interview me about the issue of author identifiers for his blog on Nature Networks, Gobbledygook. I agreed and he posted the interview the other day.

Real PRISM in the RSS Wilds

Tony Hammond

Tony Hammond – 2009 February 19

In RSS

Alf Eaton just posted a real nice analysis of ticTOCs RSS feeds. Good to see that almost half of the feeds (46%) are now in RDF and that fully a third (34%) are using PRISM metadata to disclose bibliographic fields.

The one downside from a Crossref point of view is that these feeds are still using the old PRISM version (1.2) and not the new version (2.0) which was released a year ago and blogged here. That version supports the elements prism:doi for the bare DOI, as well as prism:url for the DOI proxy server URL.

DOIs in an iPhone application

Geoffrey Bilder

Geoffrey Bilder – 2009 February 12

In Linking

Very cool to see Alexander Griekspoor releasing an iPhone version of his award-winning Papers application. A while ago Alex intigrated DOI metadata lookup into the Mac version of papers and now I can get a silly thrill from seeing Crossref DOIs integrated in an iPhone app. Alex has just posted a preview video of the iPhone application and it includes a cameo appearance by a DOI. Yay.

CURIE Syntax 1.0

Tony Hammond

Tony Hammond – 2009 January 19

In Identifiers

The W3C has recently (Jan. 16) released CURIE Syntax 1.0 as a Candidate Recommendation and is inviting implementations.

(Note that I made a fuller post here on CURIEs and erroneously confused the Editor’s Draft (Oct. 23, ’08) as being a Candidate Recommendation. Well, at least it’s got there now.)

Standard InChI Defined

Tony Hammond

Tony Hammond – 2009 January 17

In IdentifiersInChI

IUPAC has just released the final version (1.02) of its InChI software, which generates Standard InChIs and Standard InChIKeys. (InChI is the IUPAC International Chemical Identifier.)

The Standard InChI “removes options for properties such as tautomerism and stereoconfiguration”, so that a molecule will always generate the same stable identifier - a unique InChI - which facilitates “interoperability/compatibility between large databases/web searching and information exchange”. Note also that any “shortcomings in Standard InChI may be addressed using non-Standard InChI (currently obtainable using InChI version 1.02beta)”.

XMP Library for Flash

Tony Hammond

Tony Hammond – 2009 January 16

In XMP

Update about new XMP Library from Adobe Labs:

“The new Adobe XMP Library for ActionScript is now available for download on Adobe Labs. Adobe Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. XMP is an open technology based on RDF and RDF/XML. With this new library you can read existing XMP metadata from Flash based file formats via the Adobe Flash Player.

Poorboy Metadata Hack

Tony Hammond

Tony Hammond – 2009 January 06

In Metadata

I was playing around recently and ran across this little metadata hack. At first, I thought somebody was doing something new. But no, nothing so forward apparently. (Heh! 🙂

I was attempting to grab the response headers from an HTTP request on an article page and was using by default the Perl LWP library. For some reason I was getting metadata elements being spewed out as response headers - at least from some of the sites I tested. With some further investigation I tracked this back to LWP itself which parses HTML headers and generates HTTP pseudo-headers using an X-Meta- style header. (This can be viewed either as a feature of LWP or a bug as this article bemoans.)

And the DOI is …

Tony Hammond

Tony Hammond – 2008 December 22

In Metadata

Once structured metadata is added to a file then retrieving a given metadata element is usually a doddle. For example, for PDFs with embedded XMP one can use Phil Harvey’s excellent Exiftool utility.

Exiftool is a Perl library and application which I’ve blogged about here earlier which is available as a ‘.zip‘ file for Windows (no Perl required) or ‘.dmg‘ for MacOS. Note that Phil maintains this actively and has done so over the last five years. (And when I say actively I mean just that. I once made the mistake of printing out the change file.)