Semantic resources project/Use Cases/Meeting Notes/PRO/Meeting 11102009

Notes on PRO Meeting with Darren Natale

Meeting at 11am, 11/10/2009, with Darren Natale by phone.

Present: TD, PC, DN (by phone)

My (Timothy's) goals for this meeting:

  1. How are identifiers submitted to PRO? Is there web-form-style access to this process?
  2. What is the minimal information that might be needed for a PRO submission?
  3. Could any of this be turned into a web-service for bulk-uploading?

Examples:

  1. Paolo's HSP/HSC proteins? Modeling families and species.
  2. THOP1, and its phosphorylated variants.
    1. Ser664 is a phosphorylated serine residue in rat.
    2. the corresponding human residue is (I believe) cysteine663.

Before we got started, Darren dropped the big news:

 "within the next month or so, there will be a large-scale automated
 population of PRO terms, based on Uniprot KB, and will be focused on
 getting all the terms for human and mouse into PRO."

We can send in requests for proteins related to other species as well (Focusing only on experimentally characterized things... 'things known to exist.')

DN: "we are going to have a generic protein not related to particular species and then more specific proteins related to the species"

  • phosphorylated? - A Generic term for a phosphorylated state but without location (given as annotation).
  • "we are not set up to deal with epitopes themselves but we should cover the annotation of a protein with an epitope"
  • TWD: "that's okay, we can import or model epitopes on our own and then relate them back to PRO terms."

Representation hierarchy for modifications: Generic protein term -> species-specific protein -> modified protein state.

DN: "submitting terms for *families* of proteins is probably best done through the tracker (SourceForge)"

For requesting proteins (RACE_PRO):

 http://pir.georgetown.edu/cgi-bin/pro/race_pro

For requesting families:

 http://sourceforge.net/tracker/?group_id=266825&atid=1135711

PAF files for requesting proteins:

 ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/

Bulk upload will *not* include modified terms -- if the uniprot entry indicates an experimentally modified state, then a term will be created for that state, but annotations are used to give location and type of the modification.

TWD: Asking about "appending" as well as inserting new information.

  • Can we submit new information we've learned, about objects that already have PRO identifiers?
  • DN: we could adapt this somehow -- "you would enter, instead of a Uniprot KB ID, you enter a PRO identifier, and that would tells us that this is an annotation or update to an existing entry."
  • [Later (see below) DN suggests a separate submission format for annotations in the "PAF" format instead.]
  • TWD: nothing we do will *replace* or some how knockout information from existing entries.

DN now suggests using the PAF file format for annotations:

  • We could use the PAF file format for annotation -- that would be the format we'd want to use.
  • Modifications are represented as both PRO terms and as annotations.
  • First, look at the PAF file, and second look at RACE-PRO. "Get familiar with what's there."

What we've converged on is the following:

  • There should be a batch file submission format that resembles the RACE PRO form, for things that need new identifiers.
    • (Paolo and I decide, later, that we can devise our own system for provisional terms to use in our internal modeling.)
  • There should also be a batch file submission format that resembles PAF.txt, for submitting annotations either to new or existing terms.

We will, in the next couple of days:

  1. identify a new term submission file format (a la RACE PRO)
  2. identify an annotation submission file format (a la PAF.txt)

Darren will give us comment on these, and also look around for some of the tools that might have to facilitate this.