RDF distribution


Nearby: How to create a Neurocommons mirror

Contents

Package-based distribution of software and of "knowledge"

The Neurocommons RDF distribution is a prototype for one possible way the scientific community might achieve scalable data integration. The inspiration for its structure comes from GNU/Linux distributions such as Debian. The distribution consists of a set of open source software packages, each adapted as necessary to work smoothly as part of the Debian distribution. Packages are integrated in the sense that they use a common set of interfaces, namely those provided by the Linux kernel and other packages in the distribution. The distribution is primarily an aggregator, collecting and adapting a wide variety of packages deriving from a wide variety of sources.

Such a distribution can be used by anyone wanting to configure a computer system. The distribution is not monolithic; packages can be selected as needed to build a custom system. And there is nothing in such an architecture that requires centralization. There may be many sources of packages, even private or proprietary ones. What makes it hang together is not administrative centralization but the use of a common set of interfaces.

Following this model the Neurocommons project has collected a set of RDF modules or bundles. There are bundles for a variety of public information sources, processed to varying degrees so that they are rendered in RDF following a set of standard interfaces (ontologies and URI systems). Together the bundles form a distribution from which one may create knowledge bases out of selected components.

The bundles may be downloaded for use in any scenario that requires RDF; loading into a triple store is the expected use, but other applications are certainly possible.

The entire distribution is loaded into our own triple store, where it may be accessed by composing SPARQL queries.

Note: If you have experience with our previous distribution (before 29 September 2008), please see: Differences between v0 and v1

Loading and installing

There are two ways to download the distribution:

  • Bundles method: Download RDF files via rsync, then load them into a triple store
  • Restore method: Download a full backup file for use with OpenLink Virtuoso

Use our RDF installer either to initialize a Virtuoso triple store and load bundles into it, or to initialize a Virtuoso triple store from a backup. Instructions for downloading and installing bundles and backups can be found here.

Neurocommons SPARQL endpoint

Our SPARQL endpoint is available for general use. It provides access to our distribution via queries.

Bundle documentation

Each bundle is an RDF source module, that is, a set of files in RDF format (either RDF/XML or Turtle). For the larger bundles, files are organized into directories. Most of the individual RDF files are compressed.

Each bundle has a configuration file giving a version number, the name of the bundle's RDF graph, dependency information, and occasionally scripts that help to set up the graph. In most cases a bundle may be used directly in RDF-consuming applications other than Virtuoso simply by loading the contained RDF files.

Bundles documentation lists all bundles in the distribution, with links to documentation for the individual bundles.

The content of each bundle constitutes an RDF graph. The graph for bundle B is generally named http://purl.org/science/graph/B (with some exceptions that will be fixed soon).

Future plans

In order for an ecology like the one prototyped here to scale up, ontologies and URIs must be stable. The semantic web community has not yet produced stable URIs either for citation (database records or "dbxrefs") or for many scientific entities such as nucleic acid sequences. We are working toward resolving many of these deficits in naming and hope at least to have community-supported URIs for database records established in our next major release (see Common Naming Project).

There are modeling errors in many of our bundles, especially around the ad hoc ontology we built in haste in order to get something running, and we plan to fix as many of these as we can.

Unfortunately such improvements will break existing queries, so anyone using our distribution needs to be aware of this. After the next release there should be much less disruption, but incompatible upgrades are inevitable in such a young technology. RDF bundle development resembles software development, in which stability is achieved only after several release cycles.

We will continue to expand the distribution as we learn about new RDF sources and as we adapt other kinds of sources to this framework. The page Future information sources lists ideas about what we new content we might include in the future. Post to our discussion group to make other suggestions for the development of this project.

Support

We cannot make promises, but will do our best to support users of the distribution and of the SPARQL endpoint.

If you are using either, you are urged to join our discussion group. This is the best way to keep abreast of developments.

As everything we do is open, there are no obstacles to anyone else providing support and doing development.

Our issue tracker is open for reporting bugs and tracking their progress.

General documentation

Legal notices

Please read these notices before using the RDF distribution.