Bundles/hla
Bundle: hla
Graph: <http://purl.org/science/graph/hla>
Derived from file hla.dat exported by the IMGT/HLA database.
The database, like so many others, is a set of records, each with its own accession id. Each record describes one HLA allele. For example, record HLA00007 gives information related to allele HLA-A*0202.
The HLA records are given pseudo-common-naming-system URIs, e.g. <http://purl.org/commons/record/hla/HLA00007>:
select distinct ?p ?o
where
{
graph <http://purl.org/science/graph/hla> { <http://purl.org/commons/record/hla/HLA00007> ?p ?o. }
}
However, for now the records themselves are not of much use. The records link to alleles (via foaf:primaryTopic, just as for Medline records to journal articles), which are classes with URIs defined in the MaHCO bundle Bundles/mahco-hla. The important links in the hla bundle are from alleles to papers indexed by Medline:
select distinct ?s ?p
where
{
graph <http://purl.org/science/graph/hla> { ?s ?p <http://purl.org/stemnet/HLA#A_0202>. }
}
The articles are related to the alleles via both IAO "mentions" <http://purl.obofoundry.org/obo/IAO_0000142> (for uniformity with Bundles/medline/alleles) and IAO "is about" <http://purl.obofoundry.org/obo/IAO_0000136>, the second reflecting the much stronger relationship suggested by the allele being manually curated in IMgT/HLA as a being reference for the particular allele.
As of August 2009 this bundle provides 2848 links from article to allele.
In the future we might extract other information from the flat file, such as gene and/or protein sequences, or Uniprot references.
Reference Sequences and Perturbations
The IMGT also provides sequence alignments; for example, B_prot.txt for HLA-B:
HLADB-2.26.0-Jul 2009 HLA-B Protein Sequence Alignments Sequences Aligned: 17 July 2009 Steven G. E. Marsh, Anthony Nolan Research Institute. Prot. Pos. -30 -20 -10 10 20 B*070201 MLVM APRTVLLLLS AALALTETWA GSHSMRYFYT SVSRPGRGEP ... B*080101 ---- ---------- ---------- --------D- AM-------- ... ... B*1404 **** ********** ********** *-----H--- A--------- ... B*15010102N -R-T ---------- G--------- -ECGVGREMA --G-SEGTAG
The items in the reference sequences are represented as
{
?locus sc:primary_allele ?allele.
?allele ro:has_part ?item.
?item rdf:type ?x
}
for each item, where ?x is sc:SequenceGap or a CHEBI amino acid class, and ?locus is an HLA gene/locus (HLA-B in the data above).
Perturbations in other alleles are represented as
{
?allele sc:perturbation ?item.
?item sc:reference_sequence_item ?ritem.
}
where ?ritem is the item from the reference sequence corresponding to ?item.
For example:
Find the perturbations in B*1404
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX sc: <http://purl.org/science/owl/sciencecommons/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix ro: <http://www.ifomis.org/bfo/1.1/ro#> select ?pos ?from ?p ?to where { ?a rdfs:label "B*1404". ?a sc:perturbation ?p. ?p sc:reference_sequence_item ?item. ?p rdf:type [ rdfs:label ?to ].
?item rdf:value ?pos. ?item rdf:type [rdfs:label ?from]. } order by ?pos
Up to ImmPort
