Semantic resources project/Antibodies

Contents

Antibodies Resource

Our first resource is the collection and curation of information on commercially and privately-available antibodies. Antibodies are an important reagent in many biomedical experiments; however, they are provided by a large number of suppliers and are accompanied by a complex array of metadata and quality information.

We are working to adapt a high quality hand-curated set of 20,000+ antibodies, assembled by the AlzForum research forum website, as a semantic resource for biological research. Our curation and modeling are general enough to handle antibody data from other sources (such as commercial suppliers). We are working with the antibody supply community to obtain additional donations of commercial antibody data to this resource.

The technical details of modeling antibodies are centered around the representation of both the antibody's specificity, as well as the identification of citations detailing the use of the antibody (classified by the experimental method for each usage). We are building a relationship with a public ontology effort for proteins, PRO, that will help us represent antibody specificities more precisely than any other large antibody dataset. Furthermore, we are leveraging the existing publication and citation resources from SWAN and Neurocommons to tie the antibody resource to existing datasets through an accurate accounting of research methods.

The antibody resource is aimed at researchers who are trying to discover antibodies specific to a pathway or biological component of neurodegenerative disease, while avoiding the common problems of naming and synonym involved in large protein datasets. At the same time, we are also aiming at allowing researchers to discover publications and research methods associated with particular antibodies, as a means to evaluate the antibody's use in their own research.

Data Collection

Our goal is to format antibodies in an OBI-compliant manner; to do this, we need information about resources that are available either commercially or through private labs.

AlzForum Antibody Dataset

We have collected raw data on 20,000+ antibodies from two different contributed sources. These datasets include both the raw data behind the AlzForum Antibodies databank, which contains a hand-curated set of antibody entries relevant to Alzheimer's and neurodegenerative disease research, as well as a private dataset contributed by a commercial antibody supplier.

We have outlined an ontology for the representation of this data. As part of building this ontology, we have identified several key features of the ultimate resource: for example, we have identified the possibility of multiple suppliers selling the same antibody under different catalog numbers, which necessitates separating the "offer" to sell the antibody from the antibody itself. This modeling will make it possible of us to ultimately include data from many suppliers in this resource, and to accurately resolve antibody references across publications and between different research laboratories.

We have used a text-mining and searching software infrastructure to identify protein names in the free-text data fields of the identified antibody datasets. Identifying antibody specificity (the protein or proteins to which an antibody is specific, the key experimental feature of an immunochemical experiment) is a central aspect of the modeling and representation of antibodies themselves. Both of the antibody datasets we have identified so far include specificity descriptions as textual fields without a consistent format and using protein identifiers (common names) that are sometimes ambiguous. We have used open source software components and standardized protein tagging methodology to identify relevant protein terms from these textual fields, and to annotate the antibodies as specific for normalized versions of those protein names.

We have interacted with PRO in building a social process for generating terms, in bulk, for the targets of these antibodies. As we identify the proteins and peptides to which the antibodies in our initial datasets show specificity, our goal has been to annotate those antibodies as specific to protein terms from the PRO ontology described above. However, the PRO ontology is currently in development and partially incomplete; some of the proteins for which we have specific antibodies are not present in PRO. In order to move forward with our modeling, we have worked with PRO to identify the key proteins which that ontology lacks, and to request new terms for those proteins.

We have worked to identify a set of research methods for indexing the use of antibodies in experimental protocols. Understanding antibodies in relation to existing biomedical experiments requires understanding how those antibodies have already been used in published experiments. We have worked to identify a complete set of immunochemical experimental methods under which the majority of antibody uses recorded in our datasets will fall. These methods can then be modeled themselves, and submitted to existing biomedical ontologies (OBI).

NIF Antibody Dataset

Tools & Workflow

Protein Name Mining

The first step in representing antibody data is how to find and represent the specificity of the antibody. This often means finding precise protein names (or the names of genes corresponding to proteins) within larger free-text fields.

Mining Protein Names

Manual Annotation

The Antibody Record Annotator provides a method for converting relational antibody data sources with free-text fields into a structured representation suitable for conversion into a Neurocommons Bundle.

Hand annotation is the first step -- free text fields are mined, either automatically or through manual annotation, for substrings which represent structured knowledge surrounding the antibody.

Automatic Rule-based Annotation

This knowledge is then transferred to the larger set of antibodies through automatic rule matching.

An Annotation Cache is used to collect the associations of text strings with structured annotations from the Manual Annotation stage. These annotations are then used as text rules for the unannotated corpus of antibody records.

Data Model

Specificity


Creation



Offer to Sell

OBI Model

There are three core elements to represent an antibody:

  1. the antibody itself
  2. an antigen for an antibody
  3. the solution which is offered by an organization
  4. the offering organization
  5. the process by which the antibody was created

We need URLs for properties which connect these classes.

We need OBI descriptions of each class, to describe their meaning and relationship with each other.

Neurocommons Bundle

/Bundle Code

Notes & External Content

Prior Work

AlzForum Content

OBI Content

Other Content

  • Antibodypedia
  • ProteomeBinders : "a European consortium proposing to establish a comprehensive infrastructure resource of binding molecules for detection of the human proteome, together with tools for their use and applications in studying proteome function and organisation."
  • IMGT : ImMunoGeneTics
  • IEDB Source Pages (ex. Mouse)
  • CHDI HTT Antibody List (it's in PDF)