E0037
Open Reading Frames (ORFs) and Codon Bias in the Genome of
Streptomycis coelicolor and the Origin and Evolution of the Genetic
Code. R. Huether1, L. Habegger1, S.
Connare1, V. Pletnev2 and W. L. Duax1,3,
1Structural Biology Dept. Hauptman-Woodward Inst., 73 High St.,
Buffalo, NY 14203, 2Shemyakin-Ovchinnikov Inst., Moscow, Russia
Federation and 3SUNY, Dept of Structural Biology, Buffalo,
NY.
Examination of the complete genome of S coelicolor
reveals that the antisense strands of 70% of the 7514 genes (5265) contain no
stop codons and could in principal be open reading frames (ORFs). Of these, 53%
(2805 genes) have a third full length ORF and 10% of these (284) have a fourth
ORF. Finally 56 of the 7302 genes have five ORFs (no stop frames). We have
previously detected a significant bias in codon usage in the short chain oxide
reductase (SCOR) enzyme family. Of 1651 predicted or known gene products in
species from bacteria, archaea, and eukaryotes, 81 SCOR genes having triple ORFs
(TORFs) were found to be encoded almost exclusively by the 32 of the 64 codons
that are GC-only or GC-rich (2 out of 3 nucleic acids in a codon being G or C)
in composition. Examination of the double ORFs (DORFs), TORFs, quartet ORFs
(QORFs) and penta ORFs (PORFs) in S coelicolor revealed a similar bias in
codon use and a DNA triple distribution that is most severe in the QORFs and
PORFs. The 256 QORF genes vary in length from 22 to 464 amino acids. When the
170 hypothetical gene products that have at least 100 amino acids are examined,
87% of the coding is from the GC-rich half of the genetic code and 82% of the
protein sequences are composed of only 10 amino acids (GPASTDLVER). Only
eighteen of the expected gene products are specificly characterized. These
include 5 dehydrogenases, 3 kinases, 2 esterases, a permease, a deformylase, 2
ABC transport proteins, a 2 component regulator, and three ribosomal proteins
[S12, L18 and L33]. The QORF subset also includes 98 proteins characterized as
being homologous with known proteins (including cysteinyl-tRNA
synthetase) and 58 gene products identified only as hypothetical proteins. The
QORFs in S. coelicolor appear to identify a subset of the codon system
that evolved first, a subset of amino acids that constituted the earliest folded
proteins and evidence of a possible two letter genetic code that preceded the
modern genetic code. This work is supported by NIH Grant No. DK26546.