The RDF distribution includes a partial conversion of Entrez Gene.

There are two conversions of Entrez Gene. This one covers the gene-info tab delimited file, which most importantly contains gene symbols, names, and summaries. The other derives from the full ASN format dump of the entire Gene database, which most importantly contains PubMed references. This latter conversion is described at Bundles/ncbi/gene-pubmed.

The gene-info file is obtained from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz and was downloaded no earlier than 2008-06-13.

Usually our triples only talk about Entrez Gene records, not genes, since we haven't figured out what NCBI means by a "gene".

Alan and Jonathan did separate conversions of the gene_info file, Alan in 2007 (1) and Jonathan in June 2008 (2, see below).

Second conversion

Graph: http://purl.org/science/graph/ncbi/gene-info

Gene and protein symbols live as strings on the sc:ggp_has_symbol property of the Gene record. A particular symbol is heuristically chosen as "primary" and is replicated on the sc:ggp_has_primary_symbol property.

Gene and protein descriptions live as strings on the sc:ggp_has_description property. A particular description is chosen as primary and placed as the sc:ggp_has_primary_description property.

The taxonomy record is the sc:ggp_from_species_described_by property.

The gene type (protein_coding, pseudogene, etc) sites on the sc:describes_gene_type property. The value is one of several classes defined in the sc: ontology.

@prefix sc: <http://purl.org/science/owl/sciencecommons/>

The following genomes are covered:

Bos taurus Homo sapiens
Caenorhabditis elegans Mus musculus
Canis familiaris Pan troglodytes
Danio rerio Rattus norvegicus
Drosophila melanogaster   Sus scrofa
Gallus gallus Xenopus laevis

For example:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix sc: <http://purl.org/science/owl/sciencecommons/>

select ?r ?o
   ?r sc:ggp_from_species_described_by <http://purl.org/commons/record/ncbi_taxonomy/9606> .
   ?r sc:ggp_has_primary_symbol ?o .
limit 100

Related bundles:

First conversion

All gene and protein symbols and names are placed as values of the dc:title property.

Protein coding genes are indicated with

  <http://purl.org/science/owl/sciencecommons/protein_coding> .

Taxon and chromosome are also indicated. To see for yourself, go to the SPARQL form (or endpoint) and issue the following query:

select ?p ?o
from <http://purl.org/commons/hcls/gene>
   <http://purl.org/commons/record/ncbi_gene/3064> ?p ?o.