Home    Site Map  Download      FAQ         Links       Genome Browser  
Current Rice Genome Pseudomolecules Release
 

We are pleased to announce release 6.1 of the Rice Pseudomolecules and Genome Annotation. The official release date for this version was June 3, 2009.

Release 6.1 is a minor update from release 6.0. A small set of genes (33) that had been classified as TE-related but that had been annotated by Community Annotators as being non-TE functional genes are no longer classified as TE-related. In Release 6.0, 121 TE-related genes were not properly flagged as being TE-related. Those genes have been reclassified as being TE-related.

Release 6.1 is otherwise identical to release 6.0 which became public on January 30, 2009. The description of release 6.0 notes below provides an overview of the current state of the rice genome annotation.

As part of our National Science Foundation-funded Rice Genome Annotation Project, we constructed pseudomolecules (virtual contigs) for each of rice 12 chromosomes. In release 6.0, we had updated the rice pseudomolecules. The reassembly of the rice pseudomolecules was based on new Oryza sativa (japonica cultivar-group) genomic sequences deposited in GenBank/EMBL/DDBJ. A list of the ordered BAC/PAC clones for each of the 12 chromosomes was obtained from the IRGSP. The pseudomolecules were constructed by resolving discrepancies between overlapping BAC/PAC clones, trimming the overlap regions at junction points in which the phase 3 BAC/PAC sequences are preferably used, and linking the unique sequences to form a contigous sequence. Insertions, deletions and transpositions that had been detected by optical mapping analysis (Zhou et al., 2007) of the release 4/5 pseudomolecules were corrected in the release 6.0 pseudomolecules. Contigs representing novel regions not in the IRGSP BAC/PAC sequences were generated from sequence reads from the Syngenta rice genome sequencing project Goff et al. (2002). Those contigs that could extend gaps or that mapped to gaps were used to improve the new pseudomolecule assemblies. Contigs that did not match the pseudomolecule were concatenated and placed into a new pseudomolecule called "ChrSy". Other BAC sequences that could not be incorporated into the pseudomolecule assemblies were assembled into a second unanchored pseudomolecule called "ChrUn".

A total of 3,450 rice BAC/PAC clones were included in the twelve pseudomolecules. At the time these pseudomolecules had been constructed, 3,408 BAC/PAC clones (98.8 %) were finished and 42 BAC/PAC (1.2 %) clones were unfinished (phase 2) as defined by Genbank. Gaps between clones (i.e., physical gaps are denoted with 1000 Ns) and the location of these gaps can be seen by following the links in the "Ordered List of BAC/PAC Clones" column of the table below. Centromeres were identified using the CentO centromeric sequence (AY101510; Cheng et al., 2002). The centromeres are adjacent to these clones on each of the 12 rice chromosomes. Please be aware that there may also be other gaps in unfinished BACs which also could be denoted with a string of Ns. In total, there are 39 physical gaps within the twelve pseudomolecules in addition to gaps at 10 centromeres and 12 telomeres

The twelve pseudomolecules that represent the twelve rice chromosomes were annotated using our automated/semi-automated rice annotation pipeline (click here to see the details). In the release 6.0 (and in release 6.1), there were 370,637,721 bp of non-overlapping rice genome sequence from the 12 rice chromosomes, and 56,797 genes (loci) had been identified, of which 6,576 had 10,593 additional alternative splicing isoforms resulting in a total of 67,393 transcripts (or gene models) in the rice genome. Note that 793 small gene models (<50 amino acids) have been excluded from our annotated gene set.

Transposable element-related (TE-related) gene models were identified using two approaches: BLASTN searches against the MSU Oryza Repeat Database and by identifying gene models containing TE-related Pfam domains. These loci (16,185) and their models (16,433) were annotated based on the Pfam domain or the nomenclature in the MSU Oryza Repeat Database. (With release 6.1, the number of TE-related loci increased to 16,220, and the number of TE-related gene models increased to 16,454.) Pack-MULEs were identified on all 12 chromosomes. They were annotated as described in Hanada et al. 2009. Transduplicate MULEs identified by Juretic et al. 2005 were aligned to the current pseudomolecules. Note that the Jiang Pack-MULEs and the transduplicate MULEs had only been identified on the Genome Browser and not in our functional annotation. Also note that although loci and gene models on ChrUn and ChrSy were annotated using the same annotation pipeline, they have not been included in our official gene set and consequently are not assigned LOC_OsXXgXXXXX identifiers. These two pseudomolecules contain 214 loci and gene models.

A total of 33,800 gene models (23,777 genes) were improved based on the experimental evidence provided by EST and full-length cDNA sequences. This had been done using the PASA program (Haas et al., 2003). A portion of PASA validation failed models was manually reviewed and curated. The structure of 1,648 gene models were manually annotated using EST paring information and comparative genomics analyses (Zhu and Buell, Genome Research, 2007). A total of 36,475 gene models have transcript support (PASA supported, MPSS, SAGE, and/or proteomic data). Using the structural annotation from the Community Annotation project (CA), we modified 43 loci encompassing 9 different CA protein families. In addition, we added 20 new loci from 5 different CA protein families to the rice genome annotation. We updated functional assignment for 378 loci using the Community Annotation.

Please note that these pseudomolecules are constructed from finished and unfinished sequence and a majority of the gene models have not been manually curated.



Table of Rice Pseudomolecule, Loci, and Gene Models in Release 6.1

Chr BAC/ PAC No. Sequence Length in Pseudomolecule (bp) Gaps Genes/Locia Gene Modelsa Ordered List of BAC/PAC Clones Download Sequences
TEb Non-TEc Totald TEb Non-TEc Totald
1 391 43,268,879 6 1,390 5,226 6,616 1,421 6,716 8,137 Chr01 Download
2 358 35,930,381 4 1,184 4,299 5,483 1,208 5,588 6,796 Chr02 Download
3 324 36,406,689 7 1,106 4,538 5,644 1,133 6,002 7,135 Chr03 Download
4 293 35,278,225 3 1,837 3,603 5,440 1,848 4,488 6,336 Chr04 Download
5 284 29,894,789 5 1,427 3,236 4,663 1,444 4,161 5,605 Chr05 Download
6 280 31,246,789 2 1,413 3,373 4,786 1,439 4,119 5,558 Chr06 Download
7 287 29,696,629 2 1,336 3,194 4,530 1,359 3,918 5,277 Chr07 Download
8 273 28,439,308 1 1,368 2,889 4,257 1,380 3,572 4,952 Chr08 Download
9 223 23,011,239 5 1,098 2,362 3,460 1,110 2,884 3,994 Chr09 Download
10 202 23,134,759 7 1,169 2,408 3,577 1,190 2,953 4,143 Chr10 Download
11 254 28,512,666 6 1,368 2,872 4,240 1,380 3,399 4,779 Chr11 Download
12 269 27,497,214 1 1,524 2,577 4,101 1,542 3,139 4,681 Chr12 Download
Totale 3,450 372,317,567 49 16,220 40,577 56,797 16,454 50,939 67,393
Download

a Excluding small gene models (< 50 amino acids).
b TE: Transposable elements related genes and gene models. The rice proteome was searched against the MSU Oryza Repeat Database with TBLASTN and against the TE-related Pfam domains with hmmpfam. Genes and gene models with matches above cut-offs were annotated as TE-related gene models. However, genes that have been identified as TE-related based on Pfam similarity but that were annotated by Community Annotators (CA) as non-TE functional genes are classified as non-TE-related and are given the CA-provided functional annotation.
c Non-TE: Non-TE related gene models.
d There are 88 loci and 88 models on ChrSy. There are 126 loci and 126 models on ChrUn. These loci and models are not included in the totals for the main pseudomolecules.
e Note that these pseudomolecules are not related to the IRGSP pseudomolecules.

 
   
 
For Rice Comments/Questions send mail to the MSU Rice Genome Annotation Project team.
 
Photographs courtesy of Robin Buell (MSU), Jiming Jiang (University of Wisconsin), and the USDA Agricultural Research Service