Information Artifact Ontology

Contents

About

The Information Artifact Ontology (IAO) is a new ontology of information entities, originally driven by work by the OBI digital entity and realizable information entity branch. The first workshop on the IEO took place in Boston at the MIT Stata Center, June 9, 2008. For more details see First IAO workshop.

Background

This effort is motivated by several experiences we have had in developing ontologies.

We initially considered calling our effort the information entity ontology. However, after our first meeting we decided to to rename it as Information Artifact Ontology -- thus narrowing the focus, and ideally leaving open the issue of whether DNA molecules are carriers of information This choice is motivated by the prime need at the moment, which is to support OBI, and to support the annotation of publications, results, databases, etc., all of which are information artifacts.

To request a term be added to the ontology, fill out this form. View the current list.

The OBI ontology is available. The information related terms are subclasses of "information entity". Here's a link to an online html browser of it (if you get a yellow banner, click on the blue link to continue) browse.

Examples of information artifacts

The following are information artifacts in this sense proposed by Barry

  • serial number
  • batch number
  • grant number
  • person number
  • name
  • address
  • email address
  • URI
  • protocol
  • lab note
  • ontology
  • gene list
  • publication
  • result
  • license
  • document granting permission
  • contract
  • novel
  • textbook
  • newspaper
  • timetable
  • recipe
  • map
  • objective specification

Selected references

Post-workshop:

Pre-workshop:

Conversation leading up to the workshop

Finally, here's a short communication between Barry and me leading up to the workshop. Me with the notes, Barry responding.

[AR]
Had some discussion with few people today - Tom Knight, Chris Hanson,
Jake Beal, Jonathan Rees. Tomorrow I'll chat with
Gerry Sussman.

A variety of interesting issues arose. First one was the Shannon
definition of information, which is at odds with our use of the term,
I think. In particular, according to the Shannon measure, a random
signal has the most information as it is least compressible. This is
at odds with the sense of information we care about - random signal
has none of it.

[BS]
This is the mass noun sense of 'information'
We are interested in the count noun 'information entity'

[AR]
Other bits that came in to play:

Role of the receiver of information. Is it information before some
agent interprets it? When the information isn't originating from a
person, this is a reasonable question. Otherwise too much information
around.

[BS]
This question becomes easier to answer if you look at information entities; the
latter are analogous to independent continuant artifacts (like screwdrivers); hence in 
normal cases they are information entities even before being registered.

[AR]
Role of the producer of information. Where does physics end and
information start? Tom Knight argues for no boundary, but I don't buy
it. This went off in the direction of asking whether intention is
necessary for information to be produced. But then what about the
instrument that produces information.

[BS]
I believe that information entities require a certain restricted kind of provenance 
(as do artifacts like screwdrivers). Hence the physics is never enough. (Analogously: a 
Credit Card Number is not a mathematical object.)

[AR]
One possibility: The information we are interested in always
originates with a sentient - either by a person thinking/ communicating, or by a machine that 
was designed to have a function to produce/communicate information.

[BS]
Exactly

[AR]
Other words suggested words that might be better than information:
Message, Enscription

[BS]
No thanks. Though I am not wedded to 'information' either; but 'message' is much 
too narrow; and 'enscription' is too precious.

[AR]
One thing that there seemed to be consensus about was what I call the
multiplexing of information entities. So consider a fragment of the
constitution layed out on a printed page. One information content
entity is the wording. Another might be the layout (vis. the layouts
you can choose in word with lorum ipsum.... blank content). Someone
might create the line breaks so that the first letter of each line
spells out another message. There was general consensus that this was
3 things.

[BS]
Agreed.