|
Protein structure in
the 21st Century
Helen M. Berman, (Rutgers University) described how the new "Research
Collaboratory for Structural Bioinformatics" intends to
bring protein structure data storage into the 21st century. The
original Protein Data Bank (PDB) was established in 1971 at Brookhaven
National Labs by those who realized that data could no longer
be casually passed from one crystallo-grapher to another. The
PDB grew from 7 to over 9000 structures. With new methods for
rapid and reliable data processing, the collaborating groups
are committed to establishing uniform data formats, expanding
the query and reporting capabilities, and providing efficient
cross links to other data. Many of these points will not be so
simple to achieve; Helen noted that the mmCIF dictionary, started
in 1980 and anticipated to be a short publication, was finally
published in book form last year. There is also a new validation
server, that allows one to determine the structure quality (an
expanded Procheck). Helen was also eager to have a data base
that would contain more of the raw data used to calculate protein
structures. To check out the "new" PDB, visit the website
http://www.rcsb.org/.
The consortium may soon have a whole lot more structures to
compile. Joel Berendzen (Los Alamos National Laboratory) presented
the Structural Genomics approach to the determination of protein
structures, based on high throughput, parallel and low-cost methods.
The goal of most structure determinations is to achieve very
high resolution so as to answer specific questions about the
function of the protein and its interaction with ligands. Structural
genomic goals, to classify proteins and obtain a more complete
database of protein structure motifs, can be satisfied with less
accurate structures. Of course, the major bottleneck to standardizing
crystallization techniques will be high-throughput purification
of the diverse proteins. Their group is part of consortium to
analyze the recently determined genome sequence of Pyrobaculum
aerophilum, a microaerobic archaeon which thrives at 103°
C. They have developed a rapid screening program to identify
suitably soluble proteins by fusing coding sequences to that
of the green fluorescent protein. In their assay a green colony
will form only when the fusion protein is soluble (i.e., properly
folded). They estimate that structures for 8% of the genome can
be determined rapidly if they limit their work to those that
can be solubly expressed at 37° C in E. coli (of which about
45% should crystallize). Once they obtain crystals, structure
determination can be made more efficient by completely automating
the reading of the electron density map and structural refinement.
The first product of the work was a structure for the bacterial
initiation factor 5a, which is involved in initiating DNA and
translating to mRNA (the human analogue is a cofactor for HIV
rev). Although there was no apparent sequence identity, the fold
of the C-terminal was identical to that of the E. coli
cold shock protein A.
|
|

Helen Berman
While the current cost at Los Alamos is about $50,000 per
structure, which is probably a good deal lower than that for
conventionally determined structures, one can expect that full
implementation of their high throughput methods will reduce even
this figure. Joel pointed out that the whole human genome project
still cost less than one atomic submarine. A rational approach
to structure design can prove to be the most cost efficient way
to define protein targets for new drugs and attack pathogens
more specifically.
Due respect for posters
The poster sessions were made especially lively this year thanks
to the introduction of The Beckman Coulter Awards to recognize
outstanding research by students and post-doctoral fellows. Jonathan
W. Neidigh of the U. Washington, Seattle received first prize
($400) for the poster "Designing a 20 residue protein".
The author, in cooperation with Matthew Fesinmayer and Niels
Andersen, incorporated a "Trp cage" (a hydrophobic
cluster of Phe, Trp and Pro side chains) into a peptide that
folds in water to the desired structure (characterized by NMR
and CD). The second prize ($250) was awarded to Simon Lovell,
Duke University, for the poster "Crystallographic map fitting
made (a little) easier", co-authors J. Michael Word, Jane
S. Richardson and David C. Richardson,which outlined new tools,
including the Clash program, that can be used to identify mistakes
in crystal structures. The third prize ($150) went to Jin-Quan
Luo, UTMB for "The crystal structure of the PexB/Dps A DNA-binding
and protecting E.coli protein (co-authors Mark A. White, Deqian
Liu, Robert O. Fox). All of the following authors received honorable
mention: Mitch Mitchell (The fast, the slow and the metal: a
story of the Serratia and I-Ppol Endo nucleases and the role
of magnesium in their active sites), Maria Jezewska (Mammalian
DNA repair polymerase b binds ssDBA using two different binding
modes),Larisa Kosynkina (Automatic structure determination of
homologous proteins from NMR spectra), M.R. Ferguson (Tight binding
sequences for a SH3 domain selected by phage display), Mingli
Yang (Jaz, a novel zinc finger protein having a double stranded
RNA binding ability required for nucleolar localization), and
Hong Pan (Probing the basis for allosteric behaviour in dihydrofolate
reductase using an ensemble-based description of the native state).
Catherine H. Schein
|