Semantic Web

The idea of the Semantic Web is as follows:

If the Web is a network whose nodes are documents and whose arcs are hypertext links between documents (generally HTML href= attributes), then the Semantic Web is a network whose nodes include, but are not limited to, documents and whose arcs make meaningful links between nodes. It is a web-scale semantic network.

Nodes in this network can be anything at all - documents, companies, tissue cultures, colors - and are (usually) named by URIs. Arcs are expressed as statements in a notation such as RDF that specify the source and destination of the arc, and the relationship that connects them. Collect all such statements that have been published on the Web and you have the Semantic Web.

While this makes a nice vision, it is sort of abstract, and its benefit is not immediately clear. We prefer to think of the engineering techniques that have emerged from the Semantic Web idea as solutions to problems actually faced by the scientific community. These include:

  • Proliferation and incompatibility of syntax (spreadsheets, SQL, XML, and so on) - RDF provides a bland common form into which more or less arbitrary information can be rendered.
  • Combining information for joint query - RDF is a notation for "balls of mud". You can always combine two RDF documents and get another.
  • Selecting and recombining information - similarly, half a ball of mud is a ball of mud.
  • Linking related columns and rows of data sets - RDF provides a discipline of types and subtypes that encourages recognition and coordination of similar things. Even when types do not match exactly, they can often be related via subtype or other relationships.
  • Sharing information - publishing data in RDF on the Web builds on existing standards and protocol stacks.
  • Falsifiability - the philosophy that arcs are meaningful statements, capable of being true or false, encourages clear, context-neutral data curation.

It is RDF's "semantic" aspect that most clearly distinguishes it from other notations such as XML. Although it uses URIs, it takes these to be names for things in a logical system (or "knowledge representation" language). If a URI meant to name a cell line is used in a web client, the document that comes back is supposed to be useful in some way in understanding the role of the URI in RDF, i.e. what it is supposed to stand for.

General information can be found in Wikipedia. A Semantic Web project with goals similar to those of the Neurocommons is Linking Open Data.

Workshop-Harnessing the Semantic Web for your Organization was conducted at BioIT World 2008. In Alan Ruttenberg's presentation he summarizes: The Semantic Web adds to Web standards and practices encouraging

  • Unambiguous names for things, classes, and relationships
  • Well organized and documented in ontologies
  • With data expressed using uniform knowledge representation languages
  • To enable computationally assisted exploitation of information
  • That can be easily integrated from different sources
  • Both within and across public and organizational boundaries