ImmPort

HLA project report, November 2009: HLA Structure Variation

Science Commons is working on contract to ImmPort (Immunology Portal) to explore the use of Semantic Web technologies for that repository. The first use case concerns investigating factors in the disease systemic sclerosis.

We'll use this page as a starting point for work done on this project or that spins off from it.

HLA-A is Entrez Gene 3105, Uniprot P04439. HLA-B is Entrez Gene 3106, Uniprot P01889.

/Demo_sketch, /Demo_queries

Contents


Data Sources under review

MaHCO - an MHC ontology (StemNet)

  • Ontology (with two .owl files) providing URIs for HLA alleles and MHC-related classes
  • Derived in part from IMGT/HLA database, but has additional intelligence
  • Outcome: Bundles/mahco

IMGT (at EBI)

  • IMGT = LIGM (immunoglobulins and T cell receptors) + HLA (alleles of MHC proteins)
  • Seems to consist mainly of sequence and alignment data. Some pubmed references too.
  • Linking opportunities: Medline; EBI sequence record.
  • download page.
  • On ashby: /work/imgt/hla/source/*.{fasta,msf,pir}
  • Outcome: Bundles/hla - a bundle that captures allele/medline associations from hla.dat.

HLA Sequence Feature Variant database

PDB - protein structure data (3D coordinates)

dbSNP

  • "large but uniform."
  • Mostly sequence data.
  • Linking opportunities: Publication (pubmed), author, organism, possible OBI annotation ("method").
  • Hairy ER diagram.
  • ftp site readme
  • Tables available as ASN, and also as tab-delimited

IEDB (Immune Epitope Database)

  • Bjoern Peters was interested in this - recruit him to do RDF
  • Linking opportunities: Cell type, chemical species, Siwssprot, Genbank, PDB, maybe OBI (assay type), patents(?)
  • Related (prior) databases: MHCPep, SYFPEITHI, HIV Sequence Database, JenPep, HMCBN, Corixa, Pangea, Epimmune
  • On ashby: /work/iedb/source/*.xml

MHCBN

The MHCBN is a curated database consisting of detailed information about Major Histocompatibility Complex (MHC) Binding, Non-binding peptides and T-cell epitopes. The version 4.0 of database provides information about peptides interacting with TAP and MHC linked autoimmune diseases. This database is Developed by Dr Raghava's Group, at Bioinformatics Centre, Institute of Microbial Technology, Chandigarh, INDIA.

How does this relate to IEDB? Is it an input in the construction of IEDB or what?

Innate DB (pathways)

  • Similar in form to HPRD
  • Incorporates IntAct, DIP, MINT, BIND
  • Linking opportunities: Pubmed, OBI/method (a la PSI-MI), cell type. Interaction participants have full cross-reference info: Ensembl, Unigene, HUGO, OMIM, Entrez Gene, etc.
  • On ashby: /work/innatedb/www.innatedb.ca/download/interactions/*

Medline

  • By simple text matching we can find occurrences of allele names in Medline abstracts. This is useful because many things link to Medline, and Medline is organized by subject headings (MeSH), providing a way to collect sets of papers relating to a particular subject as a starting point for queries.
  • Outcome: Bundles/medline/alleles

Open Biomedical Ontologies (OBO)

MeSH

Entrez Gene

OMIM

  • JAR has converted OMIM allele mentions to RDF. There are only a few hundred (restricting to the ones using the * nomenclature). Not clear what to do with it.
  • Outcome: Bundles/omim

Other ontologies and thesauruses to be considered

Other databases to be considered

  • Epitope Prediction and Analysis Tools
  • dbMHC - check out the tree view
  • CTD (comparative toxicogenomics)
  • HapMAP
  • HLA Informatics Group
  • Human MHC haplogroups in Wikipedia
  • The MHC Haplotype Project - The MHC Haplotype Project offers a framework and resource for association studies of all MHC-linked-diseases. It will provide the complete genomic sequences of at least 8 different HLA-homozygous typing haplotypes (listed below), their resulting variations (SNPs and DIPs) and ancestral relationships
  • SNPedia
  • TANTIGEN Tumor T cell antigen database is a data source and analysis platform for cancer vaccine target discovery focusing on human tumor antigens that contain HLA ligands and T cell epitopes. It contains 2005 antigen entries from 251 protein antigens. The database also provides information on T cell epitopes and HLA ligands with full references, gene expression profiles, antigen isoforms, and mutations. Predicted binding peptides of 15 HLA Class I and Class II alleles were also included in the database.

Database lists

Scientific background

Queries

Ports

Media:Occurrences.tgz = occurrences of allele names in Medline abstracts, represented as a gzipped tar file containing 700+ RDF/XML files.