Response to Good and Wilkinson 2006
I recommend that anyone interested in the Semantic Web for Life Sciences read "The Life Sciences Semantic Web is Full of Creeps!" by Good and Wilkinson, published in Briefings in Bioinformatics in 2006. It presents a careful and thorough analysis of the problems that face our community.
I would like to respond, with respect, to a few statements made in the paper.
"URLs offer only a limited solution to the naming problem in that they are intended to be used to identify documents rather than discrete conceptual units, and offer no presumption of stability over time." There are two claims here:
1. Regarding limits on intended use, the question is, who is doing the intending? It was perhaps true in the initial design of the Web that http: URIs (then called URLs) were intended to be limited in this way, but nothing in the http: URI scheme specification really required this limitation, and they started to be used independently of the HTTP protocol and document association with the advent of XML namespaces and RDF. The use of http: URIs to "denote" was beginning to take hold with Fielding's work on web architecture and early development of RDF, and was sanctioned for arbitrary http: URIs by the W3C Technical Architecture Group with its httpRange-14 resolution. While protocol-independent use of http: is probably not as widely known as many would like, and in spite of that annoying and misleading reference to the protocol at the beginning of each http: URI, the practice of using http: URIs for to denote things (with documents as a special case) is well established from a standards, tools, protocol, and practice perspective.
2. Regarding stability, this is a mostly a function of how well the service of providing useful documentation ("metadata") for the URI is supported. This has little to do with the syntax of the URI, which URI scheme is used, or what protocol (HTTP, LSID, handle, ...) is used. Rather it relates to the level of ongoing service that institutions and other organization provide - whether the documentation can be found when it is needed. This level of service can, and does (as evidenced by the number of "dead" LSIDs), vary quite independently of syntax and protocol.
"... an important advance of LSIDs over URLs is that they can be used to identify entities other than documents." - see above. This may have been true in 1999, but since then http: URIs have been embraced as names for arbitrary thing.
"... the required resolution behavior is to return the identified document (i.e. the entire ontology), and this creates significant problems for large ontologies." This has never been a requirement, and anyone working with large ontologies on the semantic web (such as GO or any of the NCBI resources) puts each the documentatiexplaining each URI's denotation in a separate file. Scope limits can be set for # URis by having only one fragment identifier in the file, or for non-# URIs by using 303 See Other redirection.
"... the BioPathways consortium 'wraps' large portions of the NCBI data sets..." - view.ncbi.nlm.nih.org is promising, if that's what you intend. We should encourage NCBI to provide a statement of intent regarding persistence and future work with the Semantic Web community.
"As such, the benefits offered by a location-independent naming system are not compelling." You are right that many do not see the need for a mechanism that, like the LSID protocol, explicitly states the option for the client to bypass the domain naming system protocol. Here I just want to point out that the need is clear and occasionally recognized (e.g. ARK), but, again, solutions are enabled not by changes to syntax but by the existence of a deployed override mechanism. For http: URIs such a mechanism could be very simple, and it should not be too difficult to standardize so that different applications can share override rule sets. On the other hand, as you point out, the demand for an override mechanism is forestalled by concerted efforts to arrange durable DNS resolution and server behavior.
More important than technical details here is the psychological assurance that would be provided knowing that overrides are permitted by architecture and supported by applications - even if such a mechanism were never used. This social factor may underly much of the resistance to using http: URIs.
Redirection to mirrors is also important for performance, a need that will be felt sooner than the need for failover. Either need can be met through simple client software modifications or, if necessary, by configuring a proxy server or content cache.
I do not want in any way want to criticize the overall quality of the paper, which I have found to be very cogent and helpful.
- Jonathan
