Bundles/ncbi/gene-info
The RDF distribution includes a partial conversion of Entrez Gene.
There are two conversions of Entrez Gene. This one covers the gene-info tab delimited file, which most importantly contains gene symbols, names, and summaries. The other derives from the full ASN format dump of the entire Gene database, which most importantly contains PubMed references. This latter conversion is described at Bundles/ncbi/gene-pubmed.
The gene-info file is obtained from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz and was downloaded no earlier than 2008-06-13.
Usually our triples only talk about Entrez Gene records, not genes, since we haven't figured out what NCBI means by a "gene".
Alan and Jonathan did separate conversions of the gene_info file, Alan in 2007 (1) and Jonathan in June 2008 (2, see below).
Second conversion
Graph: http://purl.org/science/graph/ncbi/gene-info
Gene and protein symbols live as strings on the sc:ggp_has_symbol property of the Gene record. A particular symbol is heuristically chosen as "primary" and is replicated on the sc:ggp_has_primary_symbol property.
Gene and protein descriptions live as strings on the sc:ggp_has_description property. A particular description is chosen as primary and placed as the sc:ggp_has_primary_description property.
The taxonomy record is the sc:ggp_from_species_described_by property.
The gene type (protein_coding, pseudogene, etc) sites on the sc:describes_gene_type property. The value is one of several classes defined in the sc: ontology.
@prefix sc: <http://purl.org/science/owl/sciencecommons/>
The following genomes are covered:
| Bos taurus | Homo sapiens |
| Caenorhabditis elegans | Mus musculus |
| Canis familiaris | Pan troglodytes |
| Danio rerio | Rattus norvegicus |
| Drosophila melanogaster | Sus scrofa |
| Gallus gallus | Xenopus laevis |
For example:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix sc: <http://purl.org/science/owl/sciencecommons/>
select ?r ?o
where
{
?r sc:ggp_from_species_described_by <http://purl.org/commons/record/ncbi_taxonomy/9606> .
?r sc:ggp_has_primary_symbol ?o .
}
limit 100
Related bundles:
First conversion
All gene and protein symbols and names are placed as values of the dc:title property.
Protein coding genes are indicated with
<http://purl.org/commons/record/ncbi_gene/3064> <http://purl.org/science/owl/sciencecommons/describes_gene_type> <http://purl.org/science/owl/sciencecommons/protein_coding> .
Taxon and chromosome are also indicated. To see for yourself, go to the SPARQL form (or endpoint) and issue the following query:
select ?p ?o
from <http://purl.org/commons/hcls/gene>
where
{
<http://purl.org/commons/record/ncbi_gene/3064> ?p ?o.
}
