URI documentation protocol

(Back to URIs, URI requirements, URI documentation, Documentation source override, Separation of concerns)

How should one go about finding documentation that will explain what a URI is meant to name (or "denote"), in cases where this is needed and it is not already at hand? HTTP was designed to get you a web page, not to get information about naming, so the following protocol is not as elegant as would be a protocol designed with naming in mind.

(URI documentation generally consists of a short description of what is to be named by the URI, but may also contain information about the status of the documentation itself, such as authorship or its progress along a review track.)

If you're doing bulk processing of URI documentation, you may be better off doing bulk downloads or SPARQL queries on an appropriate SPARQL endpoint, as large numbers of probes will be inefficient and will load servers, usually unnecessarily.

The following protocol is designed to be forgiving enough to grandfather many URIs already in use on the Semantic Web, such as those in the RDF Schema vocabulary and Dublin Core, while strict enough to support our URI requirements. It coincides with HTTP in the absence of overrides, responses containing a Location: header, and 200 responses.

Draft protocol

Note that this protocol is not "authoritative" because for the use of URIs in communication (i.e. to denote) an authoritative source of correct URI documentation is inherently impossible to establish using any technical protocol.

1. If there is a documentation source override rule that applies to the URI, apply it. For a documentation URI mapping rule, proceed to step 6. For a replacement URI mapping rule, go to step 2.

2. If the URI is not an http: URI, documentation access is not specified by this protocol.

3. If the URI contains a fragment identifier (#), strip the # and following characters to obtain a documentation URI. Go to step 6 and be prepared to find the documentation you want mixed in with documentation for a lot of other URIs.

4. If there is no #, do a HEAD request with Accept: application/rdf+xml, application/xhtml+xml, text/html. (The higher-priority request for RDF is necessary in order to encourage content negotiation in the direction of a 303 (step 5b), and request for HTML is necessary in case a server responds with a 200 (step 5c).)

5. Determine a documentation URI using one or more of the following methods:

5a. If a response has a response header of the form Link: <...>; rel="meta", take the link target to be the documentation URI. ("meta" should be replaced by a specific URI. See Link header.)

5b. If a response has status code 303, assume the Location: URI to be the documentation URI. (This use of 303 is a convention in use in Semantic Web contexts. The HTTP protocol does not require that the redirected-to URI be URI documentation, so the result should be treated with caution.)

5c. If the response has status code 200 and has an HTML media type (Content-type: text/html or application/xhtml+xml), do a GET to get the content specifying that Accept type. Look for a <link href="..." rel="meta"/> element under the document's <head> element, and take its target to be the documentation URI. Discard the rest of the 200 response (unless you are also specifically interested in it as well). ("meta" should be replaced by a specific URI. See Link header.)

(TBD: Specify a theory of how to use GRDDL to convert HTML to RDF.)

5d. If the response specifies a redirect (30x other than 303), follow it per the HTTP protocol and repeat step 5.

(Cooperating servers must arrange for documentation found via Link: to be the same documentation as that found via 303 or <link>, when both methods lead to documentation.)

6. At this point you should have a second URI (the "documentation URI"). Do a GET of the documentation URI, redirect as needed, and see whether the response constitutes documentation for the URI. Specify Accept: application/rdf+xml for machine-readable documentation.

You should end up with URI documentation at this point. As we are recycling the HTTP protocol for an unintended purpose, you might have something else instead, so the result should be treated gingerly.

We ask servers to follow certain documentation quality standards in the documentation that they deliver. In particular, documentation should be explicit about what the URI is supposed to denote.

Notes

Relative to current Semantic Web practice this protocol adds two new ways to find URI documentation: using documentation source overrides and Link: HTTP headers. These extensions are needed in order to meet our stated requirements.

The Link: header is not yet in use for this purpose as far as we know and the relation type to be used in this protocol has yet to be determined (see Link header). Feedback on this technique and other aspects of the protocol are welcome.

If you didn't find a documentation URI, but did get a 200 response, the URI names an "information resource" according to W3C recommendations, but we didn't find any documentation to tell us which one or anything else about it. The body of a 200 response may provide useful information about stability, provenance, purpose, etc. to a human reader (the versioning information in here is a good example), but it will not constitute machine-readable documentation for the URI and the relation between the HTTP 200 responses, which can vary over time and across content-negotiated variants, and the URI's referent will be inaccessible for computational purposes.

For human-readable URI documentation, it may suffice to skip this whole protocol and do an ordinary browser GET in the first place. Browsers handle #, follow 303s, and arrange for GRDDL transforms, so you may get the documentation you need this way, or if not then another helpful document. Override rules are expected to be used only for failure recovery, and Link: will only augment available 200 responses which in many cases are likely to give a human reader the information that is sought. If not, then revert to the above protocol. One might imagine creating REST-based services for this purpose, as has been done other protocols such as the handle protocol.

<link rel="meta"> was promoted as early as 1997 as a way to obtain RDF. The recommendation to permit <link> is based on advice from Harry Halpin, Pat Hayes, and others. Alan Ruttenberg and Patrick Stickler disapprove of <link> because it is risky - one is more likely to get information describing this one HTTP response than information that documents the URI.

Thanks to Harry Halpin for comments. The consistency requirement is a response to concerns voiced by Xiaoshu Wang.