AHPCRC Projects
Project 2–4: Protein Structure Prediction for Virus Particles Principal Investigator: Enrico Pontelli (New Mexico State University) |
![]() |
|
|
![]() |
|
| Protein electron density map (light gray) with calculated and observed helical secondary structures superimposed. | Protein structure: helical backbone and sidechains. | |
| Graphics this page courtesy Jing He (past PI, NMSU; currently at Old Dominion University). | ||
Maintaining troops in a ready state requires that they be in good health. Lethal viruses pose an obvious threat to readiness, but even a rampant infestation of the sniffles can keep a battalion from operating at its best. Effective vaccines against viral infections—naturally occurring or introduced by hostile forces—are a moving target. What worked on last year’s virus may be ineffective on this year’s mutated strain, a completely new virus, or a genetically engineered or weaponized virus. The U.S. Army has been active in the development of vaccines against the viruses that cause malaria (Plasmodium vivax); diarrheal diseases (e.g., Rotavirus); dengue fever and yellow fever (flaviviruses); and spotted fever and typhus (Rickettsia). The Army is also actively pursuing research on highly lethal viruses, including those that cause hemorrhagic fever. Understanding the mechanisms by which viral diseases originate and develop requires a knowledge of the three-dimensional structures of the proteins that make up disease-causing viruses, says Jing He, assistant professor of computer science at New Mexico State University. (Prof. He recently accepted a faculty position at Old Dominion University.) Under the AHPCRC program, Prof. He, her graduate students Kamal Al Nasr and Saeed Al-Haj, and postdoctoral fellow Weitao Sun are developing a scalable parallel computer code for identifying the most likely viral protein structures from a very large set of possible topologies. They are working from the ground up, a technique called ab initio prediction. The “ground” in this case is a set of low-resolution protein electron density maps. This information can be generated using biophysical laboratory techniques such as electron cryomicroscopy, and density maps are readily available in the literature. The resolution of such maps is typically 5–10 Å, or about 3–8 carbon atoms across. The challenge is to take a primary structure (a known sequence of amino acids), and map it onto its corresponding secondary structures. These secondary structures—the helices and sheets formed by the amino acids—can be visualized using the density map. Like fuzzy photographs, these low-resolution density maps delineate the general outlines of a protein’s shape, but the outline is often incomplete and only displays the regions where the atoms are densely populated. He’s group is working to “sharpen” these images. Ultimately, they hope to derive the spatial coordinates of the individual atoms that are included in the protein, using geometrical constraints to sort out the most likely topologies. To derive the spatial coordinates of the atoms, a coarse-level mapping must be derived between the amino acid sequence and the secondary structures in the density map. Such a mapping determines the possible topologies of the secondary structures of the density map. Then the coordinates of the atoms can be constructed using the likely topologies. The literature contains detailed experimental studies of the atom-byatom structures of existing proteins, and this provides information on the energies (and thus, the relative stabilities) associated with various topologies. Testing the Concept He selected 51 well-characterized proteins at random from the Protein Data Bank, a central repository of known protein structures. The selected proteins were required to have one single domain (i.e., no part of the protein could be capable of evolving or functioning independently of the rest of the protein), and the structure must have been determined to a resolution of 1.5 Å or better (to the level of individual carbon atoms). None of the proteins had a more than 30% similarity in their amino acid sequences, and each protein had fewer than 8 secondary structures (because of constraints on computing resources). All possible secondary-structure topologies were generated computationally for each of the 51 proteins, and the conformational energies for each topology were calculated using a multi-well Lennard–Jones potential function. He’s group uses geometrical constraints to evaluate the energies required by various protein topologies, to find the most “comfortable” configurations of bends and folds assumed by the protein’s helices, sheets, and strands. The method is being developed using known structures of naturally occurring proteins, and the assumption is that the order and directionality of the secondary structures that these proteins assume under natural conditions—the native topologies—are the most stable (requiring the least energy). If the computer model can reliably identify the native topologies in their set of most-stable structures, this builds confidence in the model’s ability to predict structures of newly encountered or engineered proteins. In most cases, the calculations placed the native topology among the most stable of the possible topologies. Of the 48 possible permutations of protein IDV5, two conformations (4% of the total 48) were more stable than the native topology. The native topology of IQC7, with 46080 possible topologies, was more stable than all but 14 possible conformations (0.03%). But Does It Scale? Because of the flexibility and complexity of protein chains, considerable computing resources are needed to evaluate and compare the relative energies associated with the numerous possible combinations of structural features. As might be expected, this sort of calculation requires significant computational resources and a program for visualizing the resulting three-dimensional structures. Every algorithm developed under this project requires implementation in parallel code to collect necessary data. As a part of this study, two parallel computing strategies were employed. In the static work allocation scheme, the possible permutations for a given protein structure were distributed evenly among all available processors. Each processor “knew” what its sequence of tasks would be, and when it finished a given task, it would automatically start on the next one. This strategy minimizes communication between processors, but processors assigned to faster tasks may sit idle as the other processors complete slower tasks. The second strategy was a dynamic scheme, incorporating a master processor that assigned new tasks, or pieces of tasks, when the “worker” processors signaled that they had completed the previous task. When fewer than 32 processors were used, the static work allocation completed the calculations more rapidly for all but the two smallest proteins, because all the processors contributed to performing the calculations. The dynamic model, which requires one master processor that allocates tasks, but does not participate in the calculations, exhibits a comparative advantage when more than 32 processors are used. For 32 processors, the dynamic method was up to 31 times faster than the static method. A load balancing method, currently under development, distributes jobs to the individual processors. The research group is investigating an affinity method that uses a message passing interface to control the work that is sent to the preferred processors. Current and Future Work In the early stages of this project, much of the effort was devoted to finding a way to establish an effective energy function—an important factor in structure prediction, since trial structures having the lowest overall energy are presumed to be the most stable, and thus the most likely to occur. Currently, the researchers are trying to develop a more accurate estimation of the energy by incorporating the geometrical orientations for the protein sidechains. The program is also being modified to handle larger protein molecules. A parallel simulated annealing optimization scheme has been incorporated in place of the previous enumeration scheme. The simulated annealing method is well suited to finding an acceptably good solution, such as a small subset of likely structures within a large number of possible permutations, rather than one best solution, which the enumeration process tries to find. This modification is currently being evaluated for its ability to predict large protein structures. Source: AHPCRC Bulletin, Vol. 1 No. 4 (2009) |
||



