PolySacDb

.........................................................................................

Carbohydrates constitute one of the most abundant types of biomolecules occurring widely in all living matter. They function as structural or protective materials and as an energy storage. In addition, carbohydrates perform a much broader biological role

. For example, they appear to be essential in the process of infection by certain pathogenic species, they specify human blood group types and are intimately involved in the immunochemistry of blood, they determine cell-cell recognition and adhesion, they function as receptors in the antigen-stimulated lymphocyte antibody immune response, and they have an important role in cancer pathology. The biological process as well as the chemical and physical properties are in part determined by the conformational behavior of carbohydrates.

Carbohydrates are polyhydroxy aldehyde, ketone, acid or alcohol compounds that have the general formula (CH₂O)_n where n is between 4 and 9. Most of the carbon atoms are asymmetric, their various stereo-isomers possess distinct physical and biological properties. Fig. 1 shows the Fisher projection formulae of the naturally occurring D-oses series (the sugars having the absolute configuration R at the last asymmetric carbon atom are called D); the L series of carbohydrate does occur as well, but to a lesser extent. Other natural units are formally derived by simple modification of these sugars such as sulfate and acetate esters, methyl esters of carboxylic acid functions, acetals or ketals of simple carbonyl molecules. For example, in D-glucose the CH₂OH group may be replaced by -COOH (e.g. glucuronic acid) CH₃ (e.g. fucose) or H (e.g. xylose), an OH group may be replaced by NH₂ (e.g. glucosamine) or NHCOCH₃ (e.g. N-acetyl glucosamine). Sugar units also occur as simple derivatives. The number of monomeric carbohydrate structures is therefore gigantic.

The open chain forms of monosaccharides are quite flexible, being capable of rotating about each of the single C-C bonds. Because the aldehyde and ketone groups of these molecules are quite reactive, this flexibility often leads to an internal cyclization as the carbonyl group reacts with one of the hydroxyl groups from the other end of the molecule. Depending on the point of attack, the resulting rings can contain five or six atoms, one of which is oxygen; four and seven-membered ring sugars is rather sparse. The closing of the linear molecule to make a ring, creates a new chiral center at C1, called the anomeric carbon. There are two possibilities, that are designated with either an α or β prefix. Several forms of each of the above-mentioned monomers can exist. The cyclization is a reversible reaction, and in aqueous solution an equilibrium between the various forms will exist (Fig. 2). This is called mutarotation.

Monosaccharides can be linked to produce oligomeric structures through glycosidic bond formation. Water is eliminated between the anomeric hydroxyl and any one of the hydroxyls of a second monosaccharide or oligosaccharide. A disaccharide is shown in Fig. 3. The glycosidic linkage consists two bonds, the glycosidic C1-O and the aglycone O-C4 ones. In the case of (1→6) linkages three bonds connect the consecutive sugar rings. Carbohydrates can exist as mono- oligo- and polysaccharides; however, they reach maximum complexity when they are covalently attached to other molecules in glycoproteins, glycolipids, or glycopeptidolipids. The number of possible oligo- or polysaccharides structures is enormous. There are 21 different compounds that can be formed by condensation of two D-glucose units. Since there are multiple points of possible linkage on a monosaccharide residue, branching of the chain is quite a common feature in carbohydrate containing molecules. In theory, the number of all possible linear and branched isomers of a hexasaccharide was found to be more than 1012 (Fig. 4). Not all of these actually occur in nature, but a large number of them does. The naturally occurring polysaccharide structures can be classified into different groups. Simple homo-polysaccharide structures can be linear (cellulose, amylose) or branched (amylopectin, glycogen). Heteropolysaccharides with various degree of branching can be alternating (agarose, carrageenan), block (alginate), complex linear repeat (gellan), complex branched repeat (xanthan), interrupted and branched (pectin). An illustration is given in Fig. 5. The complex oligosaccharides constitute another class of carbohydrate-containing molecules. These molecules generally vary in size from disaccharides to oligosaccharides of 15 to 20 residues, but may be even larger.

From top to bottom : linear, branched, alternating, block, complex linear repeat, complex branched repeat, and interrupted and branched.

Hemiacetal formation between OH-5 and the C1 aldehyde group in aldohexoses (in glucose for example) produces the pyranose ring which contains an endocyclic oxygen atom (O5) and a new asymmetric carbon center at C1. This results in a ring structure having six atoms.

The most stable conformations of six-membered ring systems (pyranoses) are the chair forms, but in the majority of naturally occurring pyranoid derivatives one of the two possible chair conformers is considerably more stable than the other. As in the case of cyclohexane ring, the pyranoid ring can also adopt energetically less favorable conformations. Six different skew conformers separated by six different boat conformers, can be identified on the pseudorotational itinerary of a pyranoid ring. The pseudorotation circle of the flexible forms of pyranoid ring and the position of the chair conformers on the conformational sphere of the parameters θ (theta) and φ (phi) are in Fig. 6. Three puckering parameters define unambiguously the position of the individual forms of the pyranoid ring on the conformational sphere. θ is the maximum puckering amplitude, the parameters θ and φ are angles in the range 0° < θ < 180° and 0° < φ < 360°, and can be thought of as polar and azimuthal angles for a sphere of radius θ. The two poles of θ = 0° or 180° represent the energy wheels of the chair conformations ¹C₄ and ⁴C₁. All twelve flexible forms are located at the equator. In unsubstituted cyclohexane the two chairs forms are the prominent species. Substitution of a heteroatom in the ring and addition of hydroxyls or other exocyclic substituents further stabilize or destabilize ring conformers in relation to cyclohexane. As a general rule, the equatorial position of bulky substituents would be preferred because of a 1,3 syn-diaxial interaction that causes steric clashes. The ⁴C₁ of glucopyranose having all ring substituents in the equatorial position is preferred to the ¹C₄ conformer in which all the substituents are in axial orientation. However, at high temperature conformational transition to this form can arise spontaneously as demonstrated by the formation of levoglucosan (1,6-anhydro-β-D-glucopyranose). This molecule results from the 1-6 elimination of water from the unusual ¹C₄ conformation that brings the two hydroxyl groups in close proximity. Besides the trivial case of glucopyranose, the α-L-iduronate ring, a constituent of the glycosaminoglycans heparin, heparan sulfate, and dermatan sulfate, which have potential uses as anticoagulant and antithrombotic agents, exhibits conformational mobility. Three forms, namely ¹C₄, ²S_O and ⁴C₁ of this ring have been suggested to be responsible for the biological activities of these compounds.

Fig. 6 Graphical representation of the Cremer-Pople puckering parameters for a
pyranose ring. Total puckering amplitude, Q, is the radius of the sphere corresponding to the
sum of the perpendicular distance of each ring atom to the ring average plane. Polar angle, θ,
indicates degree of deviation from ⁴C₁ chair conformation at the North Pole (N). Other
conformations as HB (half boat), HC (half chair), B (boat), and TB or S (twist boat or skew
boat) are located at the equator. The ¹C₄ chair form is at the South Pole (S). The theoretical
values of Q and θ are 0.63 Å and 0° for pure ⁴C₁ chair conformation in cyclohexane

The five-membered ring forms (furanoses) of D-glucose are not thermodynamically stable. However, very important furanoses molecules are commonly found in Nature. For example D-ribose and D-deoxyribose are found as the building units of nucleic acids, and fructose is a constituent of sucrose. These rings are not planar, either one (envelope form) or two (twist form) atoms are out of the plane that contains the others. The different envelope and twist conformations are of similar energy, and the barrier to their inter-conversions is rather small. Therefore, mixtures of different conformations might be expected in solution. Two puckering parameters are needed to define a conformation for furanoid rings: the puckering amplitude Q and φ the phase angle. Pseudo-rotation energy surfaces of ribofuranose along with its deoxy analogue

, fructofuranose

and arabinofuranose

have been reported. In general furanose rings have two major local minima and a path of inter-conversion. Because eclipsing of a carbon atom and an oxygen atom requires less energy than that of two carbon atoms, the oxygen atom in the furanoid ring tends to occupy the least puckered part of the ring; hence, usually either C2 or C3, or both will be out of the plane. In most cases, the conformation is one intermediate between the ideal envelope and twist forms; it is governed by the relative disposition of the substituents.

The conformations of monosaccharides have been determined by several methods including the formation of complexes from sugars in cuprammonia solution, X-ray crystal structure analysis, infrared spectroscopy, polarimetry and optical rotation. In solution nuclear magnetic resonance spectroscopy is usually the method of choice.

The energy of the different ring conformations are affected by the orientations of the hydroxymethyl group. These groups usually exist in staggered positions called gauche-gauche, gauche-trans and trans-gauche. In this terminology, the torsion angle O5-C5-C6-O6 is stated first, followed by the torsion angle C4-C5-C6-O6. It is known from crystallographic studies, NMR measurements, and theoretical calculations that the conformational equilibrium about the C5-C6 bond in aldopyranoses depends significantly upon the configuration at C4. For the 'gluco' configuration (O4 equatorial) the trans-gauche is high in energy and the remaining two conformations are almost equally populated, while the trans-gauche and gauche-trans positions are preferred for those having a 'galacto' configuration (O4 axial).

The hydroxyl groups have a high freedom of rotation, and they can participate in the creation of hydrogen bonds. As a result of the many possible orientations of such groups, prediction of the hydrogen bonding network is a difficult task. Most of carbohydrates offer an exceptionally high ratio of hydroxyl groups per saccharide residues. Usually hydrogen bonds are present to either neighboring carbohydrate molecules, glycoproteins, or surrounding water molecules.

The extreme diversity of the monomeric units that can be found in carbohydrate structures stressed the need for specific methodologies to facilitate the construction of complex carbohydrates. A carbohydrate fragment library has been created

. This data bank contains optimized geometries of many monosaccharide residues and covers most of the units which occur in polysaccharides.

The anomeric effect describes the axial preference for an electro-negative substituent of the pyranose ring adjacent to the ring oxygen, whereas the exo-anomeric effect describes the rotational preference of the glycosidic C1-O bonds

. These stereo-electronic effects are of general importance for all molecules having two heteroatoms linked to a tetrahedral center. Survey of X-ray crystallographic data reveals that these effects have geometrical consequences. The most obvious feature of the experimental data on both alpha and beta configurations is a marked difference in the molecular geometry around the acetal group. For example, in the axial configuration one observes a general shortening of the O5-C1 bond, a lengthening of the C1-O bond, and an increase in the O5-C1-O bond angle value. Molecular orbital theory accounts for these observations. The magnitude of the anomeric effect varies with the nature of the electronegative group, the polarity of the solvent, and the location of the other substituents in the molecule.

The exo-anomeric effect influences the rotations around the glycosidic C1-O bond and is therefore important in determining the relative orientations of saccharide units in carbohydrate chains. The exo-anomeric effect is a balance between electronic and steric effects. The three staggered orientations for rotation about the glycosidic bond are not equivalent, the exo-anomeric effect causes preference for the +synclinal orientation of the aglycone group in the α series and -synclinal for the β ones. A review about these effects has been published by Tvaroska

Many force fields for molecular modeling are available. The following force fields are widely used or especially designed for carbohydrates:

The GROMOS force field was developed for molecular dynamics simulations of proteins, nucleotides, or sugars in aqueous or apolar solutions or in crystalline form .
The MM2 and MM3 force fields are molecular mechanics force fields initially meant for hydrocarbons, but now applicable to a wide range of compounds . Tvaroska and Pérez published a modified version especially for oligosaccharides called MM2CARB
The CHARMM force field is designed for the modeling (both molecular mechanics and dynamics calculations) of macromolecular systems . A revision for carbohydrates was made by Ha et al. Kouwijzer and Grootenhuis redeveloped the CHEAT force field: a CHARMm-based force field for carbohydrates with which a molecule in aqueous solution is mimicked by a simulation of the isolated molecule .
The AMBER force field was developed for simulations of proteins and nucleic acids . A derivative for conformational analysis of oligosaccharides was published by Homans . Glennon et al. presented an AMBER-based force field especially for monosaccharides and (1→4) linked polysaccharides. More recently, Woods et al. developed the GLYCAM parameter set for molecular dynamics simulations of glycoproteins and oligosaccharides that is consistent with AMBER .
The consistent force field (CFF) was originally a molecular mechanics force field for cycloalkane and n-alkane molecules, optimized on both structural and vibrational data . Later, several versions for other classes of compounds were published; amongst others for carbohydrates .
The TRIPOS molecular mechanics force field is designed to simulate both biomolecules (peptides) and small organic molecules . Additional parameters for conformational analysis of oligosaccharides were derived by Imberty et al. .
The DREIDING force field is one of the newer force fields in this list, and it was developed for the simulation of organic, biological, and main-group inorganic molecules .
Recently the Merck Molecular Force Field (MMFF94) was published . It seeks to achieve MM3-like accuracy for small molecules in a combined "organic / protein" force field that is equally applicable to proteins and other systems of biological significance.

Disaccharides

The low-energy conformers of a disaccharide can be estimated using molecular mechanics. In such compounds the global shape depends mainly on rotations about the glycosidic linkages, because the flexibility of the pyranose ring is rather limited and the different orientations of the pendent groups have a limited influence on the conformational space of the disaccharide. The relative orientations of saccharide units are therefore expressed in terms of the glycosidic linkage torsional angles φ and ψ which have the definition φ = O5-C1-O-C'x and ψ = C1-O-C'x-C'(x-1) for a (1-> x) linkage. The φ, ψ space can be explored in a systematic way. Both torsions are sequentially rotated in small increments over the full 360° range. At each point of the grid the energy according to the force field in use is calculated. It is then possible to represent the energies of all the conformations available as a contour map in the φ, ψ space. These contour maps enable graphical description of energy changes as a function of the relative orientation of the monosaccharides. They indicate the shape and position of minima, the routes for inter-conversion between conformers, and the heights of the transitional barriers. There are many different methods for calculating contour maps.

Calculating potential energy surfaces

In the rigid residue, or hard sphere potential surfaces approach, the constituent monosaccharides are assumed to be rigid, with pendent groups fixed. As the φ, ψ values are changed, steric interactions between the pendent groups do occur which are unable to relax. These steric interactions cause a rapid increase in energy. This effect is especially prominent in sterically crowded molecules. In addition, surveys of a large number of known crystal structures along with and supported by semi-empirical calculations reveal small but important variations in pyranoid ring geometries and orientations of pendent groups with the φ, ψ values. These are dependent upon the anomeric and exo-anomeric effects and emphasize the need for a model to include bond length and angle degrees of freedom.

The strain produced by steric interactions inherent from rotation of monosaccharide residues is relieved by the inclusion of bond length and angle adjustment in the form of minimization with respect to all degrees of freedom of the system (except φ and ψ) at each grid point. During minimization pendent groups move to the nearest minimum downhill of the starting point. In the process of driving the molecule through unfavorable regions of the φ, ψ space, large steric interactions can sometimes cause pendent groups to overcome torsional barriers. This results in minimization to a different local well. This relaxed map describes a larger accessible potential energy surface than the rigid ones, a lowering of the energy barriers between minima, and a lower energy minimum far removed from the initial starting geometry .

Whereas rigid residue maps represent a two dimensional cross section of a 3N-6 dimensional surface, where N is the number of atoms, relaxed maps represent a larger cross-sectional window of a given potential energy surface because it allow minimization of the internal co-ordinates (bond lengths, bond angles and torsional angles) to local low energy wells. However, as minimization will only lead to conformations 'downhill' from the starting structure, the torsional dimension where most conformational variation occurs is limited to only one orientational well. It is possible that rotation of pendent groups over torsional barriers could produce lower energy conformations at that point in the φ, ψ space. Ideally at each point in this space an investigation of all possible combinations of pendent group orientations is required (i.e., assuming that each pendent group can exist in each of the three idealized staggered orientation, 3n different conformations at each point in the φ, ψ space, where n is the number of pendent torsions). This results in 312 (531441) conformations for a simple disaccharide and 319 (1.16 x 109) conformations for a more complex disaccharide typical of heparin.

Adiabatic maps attempt to represent the lowest energy of all possible pendent group orientations at each point in the φ, ψ space. On comparison with the corresponding relaxed ones, adiabatic maps are flatter, allow greater freedom about the glycosidic bonds, locate additional minima, and reduce the barriers between the minima.

At present there are several different methods for calculating adiabatic conformational maps: In the most commonly method used, the energy at each point in φ, ψ space for several different starting geometries is evaluated systematically and the lowest energy for each point is used to generate the map. This can be very time consuming, so such a systematic search is only possible for carbohydrates of limited size and flexibility.

Several procedures have been developed to scan the energy surface as a function of the two glycosidic angles in an efficient way. For example, the Random Molecular Mechanics (RAMM) grid method searches the orientation of pendent groups at each point in φ, ψ space. At each point, 1000 steps of a random walk procedure varies pendent group orientation and evaluates unrelaxed energies. Only the resultant lowest energy structure is optimized, and accepted as the energy for that point in the φ, ψ space.

The prudent ascent method moves through the φ, ψ space in a way which is dependent on previous minimizations. Large steric interactions are minimized by doing the most favorable geometries first, in a way similar to the local relaxed map. It makes use of inelastic deformations that decrease the energy by recalculating the energies of surrounding geometries using the new lower energy structure as the starting geometry. On average the energy for each point in the φ, ψ space is calculated twice .

With the CICADA method (Channels In Conformational space Analyzed by Driver Approach) the potential energy surface is explored by driving separately each selected torsion angle with a concomitant full-geometry optimization at each increment (except for the driven angle) .

The Monte Carlo method is essentially a random search method. From a starting configuration (A) a new configuration (B) is generated by random displacement of one or more atom(s). The new configuration is either accepted or rejected, on the basis of an energy criterion. When the energy of B is lower than or equal to that of A, B will be accepted. When it is higher, it will be accepted only if the Boltzmann factor (for the desired temperature) is greater than a random number taken from a uniform distribution between 0 and 1. When B is rejected, A is counted again before a new configuration will be generated; when B is accepted, it will serve as a new starting configuration. The process is repeated many times, and results in a large number of configurations, which should be representative for the system. The method is more efficient for atomic or simple molecular systems than for complex (macro)molecular systems, since a random displacement in the latter case will generally lead to such distortions of a molecule that the energy of a new configuration will usually be very high. Recently, Metropolis Monte Carlo methods have been applied to the conformational analysis of oligosaccharides with the aim of deriving ensemble average parameters or exploring the multiple conformations adopted by a complex polysaccharide such as xyloglucan .

In molecular dynamics simulations an ensemble of configurations are generated by applying motion laws to the atoms of the molecular system. The two major simulation techniques are molecular dynamics in which Newton's equations of motion are integrated over time, and stochastic dynamics, in which Langevin equation for Brownian motion is integrated over time. Several algorithms have been developed for molecular dynamics simulations. Such simulations follow a system for a limited time. Physically observed properties are computed as the appropriate time averages through the collective behavior of individual molecules. For the results to be meaningful, the simulations must be sufficiently long so that the important motions are statistically well sampled. Experimentally accessible spectroscopic and thermodynamic quantities can be computed, compared, and related to microscopic interactions. Such modeling techniques have been applied to a wide range of oligosaccharides. The structural flexibility of these molecules has been confirmed. Highly branched oligosaccharides have also been investigated by means of molecular dynamics. The results obtained with these have been interpreted as demonstrating a more rigid behavior than that found for the linear ones.

It should be noted that molecular dynamics is severely limited by the available computer power. With presently available computers, it is feasible to perform a simulation with several thousand explicit atoms for a total time of up to about a few nano seconds. To adequately explore the conformational space it is necessary to perform many such simulations. In addition, it may be possible that carbohydrate molecules undergo dynamical events on longer time scales. These motions cannot be investigated with standard molecular dynamics techniques.

It is important to recognize that most quantum mechanical and molecular mechanical procedures are designed to treat molecules in the isolated stale. Omission of the effect of the environment from the calculation results in a neglect of the fraction of the energy contribution that arises from these interactions. For example, a carbohydrate in an aqueous or crystalline environment will usually form hydrogen bonds only to neighboring molecules, while the simulation of the molecule in vacuo is dominated by conformations with energetically favorable intramolecular hydrogen bonds .

Several different approaches have been proposed to treat the solvation effects . In the simplest one, the effect of the solvent is achieved by increasing the dielectric constant for calculations of electrostatic interactions or by the use of a distance dependent dielectric constant. Unfortunately, this affects all electrostatics. An alternative approach is to the treat the solvent as a dielectric continuum. The conformational free energy of a given conformer in a particular solvent may be described as arising from the contribution of the energy of the isolated state and the solvation free energy . A computationally very efficient method is the use of the CHEAT95 force field, which is parametrized in such a way that the simulation of isolated carbohydrates mimics the behavior of the molecule in aqueous solution .

At present, the best approach is the inclusion of the environment in the simulation, viz. a molecular dynamics simulation with explicit water molecules or other surrounding molecules. By applying periodic boundary conditions a true, but still very small, system is simulated. Of course, this is very time consuming for an oligosaccharide in water.

Probing potential energy surfaces

It is the objective of the present section to illustrate that potential energy surfaces can be put to a demanding test. Indeed, many observable properties of oligosaccharides can be calculated and have been shown to be sensitive to the details of the conformational energy surface.

More than 3600 crystal structure determinations of carbohydrates are listed now in the Cambridge Crystallographic Data Base. X-ray analysis gives the best data for the conformation of a carbohydrate. Precise atomic co-ordinates are provided, along with an explicitly defined environment. Although the crystalline state is often dismissed as irrelevant to biological processes, comparisons with crystal structures are among the most precise test of modeling available for carbohydrate molecules, provided that packing forces are taken into account. By molecular dynamics simulations of crystal structures both force fields and methods can be validated .

A more common method to use crystallographic data to test computer simulations is the superposition of conformations found in crystal structures on a calculated potential energy map. For example, in Fig. 7 the potential energy surface of cellobiose as a function of the glycosidic torsion angles is given. A search in the crystallographic database for molecules with a link similar to that in sucrose results in a number of conformations, which can be plotted on the calculated surface.

methods-molmod-Fig7

Fig. 7 Potential energy map of cellobiose calculated with MM3 (data kindly provided by A.D. French)

The red dots denote conformations found in crystal structures. Iso-energy contours are 1 kcal/mol apart.

It should be kept in mind that the conformations found in crystals can be influenced by packing effects, so that they differ from the preferred conformation(s) in aqueous solution and in vacuo. A interesting example of the problems that can arise with in vacuo calculations is given by sucrose. In figure 8 maps calculated with MM3 and CHEAT95 are given, together with a population density map calculated from a molecular dynamics simulation in water . The MM3 map predicts a high potential energy (5.5 kcal/mol higher that the global minimum) for the conformation of the sucrose link found in raffinose. The simulation in water shows that this is an artifact of the MM3 force field; the calculations with the CHEAT95 force field perform much better in this respect. The lowest-energy conformation is stabilized by an intramolecular hydrogen bond, which cannot be formed in the raffinose conformation. Nevertheless, this conformation appears to be stabilized by surrounding molecules. An extensive study of the energy contributions in this glycosidic link showed that the problem was not the overlapping anomeric sequence, as was suggested , but a very high barrier for one of the torsions .

methods-molmod-fig8a __ methods-molmod-fig8b

methods-molmod-fig8c

Fig. 8 Potential energy map of sucrose calculated with MM3 (top left ) (data kindly provided by A.D. French) and CHEAT95 (top right ) .; the red dots denote conformations found in crystal structures. Iso-energy contours are 1 kcal/mol apart. (bottom) Population density map calculated from a molecular dynamics simulation in water .

In solution, the method of choice to study the 3D structure of saccharides is nuclear magnetic resonance (NMR), through the parameters represented by chemical shifts, coupling constants, nuclear Overhauser effects (NOEs) and also relaxation time measurements. While the conformational dependence of the carbon chemical shifts is far from understood, coupling constants can be used to evaluate the magnitude of the torsion angles, nOe measurements can provide estimations of distances between protons located in rather close proximity. In addition, relaxation time measurements give information on the mobility and the behavior of molecules in solution. A major difficulty in the determination of the conformation of an oligosaccharide from NMR data is the flexibility of the carbohydrates, and especially of the glycosidic links. When multiple conformations are present in solution, NMR data will represent a time-averaged conformation. Since the geometrical parameters are usually related in a non-linear way to the experimental data, these data can be very misleading. Consider, for example, an oligosaccharide in solution that occupies two distinct conformations, one with a relatively short distance between the protons at both sides of the link, and one with a rather large distance (which is preponderant). The measured NOE is an average value, so the NOE could easily lead the interpreter to a single non-existing conformation. Even when it is known that two conformations are present, errors can easily be made since the preponderant conformation will produce only a small contribution to the resulting NOE.There are not many experimental means, other than NMR, suitable for probing carbohydrate conformations and evaluating calculated potential energy surfaces. However, the optical activity of saccharides depends on their chemical composition, configuration, and conformation. Models have been developed and applied to disaccharides in aqueous solution . These studies result in the location of preferred regions in the configurational space, rather than in the location of some well-defined points. The technique for oligosaccharides is not widely applicable, but useful to complement NMR methods. Sucrose , cellobiose and maltose are among the numerous disaccharides which have been investigated so far. It has been reported that the optical rotation observed in the agarose gels can be satisfactorily accounted for in terms of associated double helix chain conformations rather than extended simple helix .

The Disordered State of Polysaccharides

The polysaccharide chains in solution tend to adopt a more or less coiled structure. Such a dissolved random coil would fluctuate between local and overall conformations. Polysaccharides are able to assume an enormous variety of spatial arrangements around the glycosidic linkages because these molecules have extensive conformational freedom. Theoretical polysaccharides models are based on studies of the relative abundance of the various conformations, in conjunction with the statistical theory of polymer chain configuration

Possible interactions between residues of the polysaccharide chain that are not nearest neighbors in the primary sequence of the polymer are ignored. The range of conformations of polymer molecules are reflected by a Monte Carlo sample. The observable properties of dissolved polysaccharides are averaged over the entire range of conformations accessible to the chain, and they may be determined from conformational states derived from the potential energy surfaces of the consecutive disaccharide fragments. This approach yields properties corresponding to the equilibrium state of the chain. Results refer to a model for an unperturbed chain that ignores the consequences of the long range excluded volume effect, because only nearest neighbor interactions are accounted for in the computation of the φ, ψ surfaces. Given a sufficient Monte Carlo sample of unperturbed polysaccharide chains, it is possible to assess average properties of the polymer in question simply by computing arithmetic averages over the chains of the sample. For example, the mean square end-to-end distance, the mean square radius of gyration, the average persistence length, dipole moment, etc. are all average geometric properties readily computed from a knowledge of the co-ordinates of the atoms or atomic groups making up the Monte Carlo sample

Models of native polysaccharides, refined to various extent, have been presented

. Monte Carlo methods have been applied to exploring the multiple conformations occurring in a complex polysaccharide such as xyloglucan

. Models of polymer chain extension were first used to compare the effect of the glycosidic linkage geometry of simple polysaccharide chains e.g. cellulose and amylose

. Both polymers are 1→4 linked glucans, the only difference is in the anomeric configuration on the C1 atom of the monomeric unit, α and β for amylose and cellulose, respectively. The calculated data shows a remarkable pseudohelical chain trajectory of the amylosic chains, the characteristic ratio of 5 denotes a moderately compact chain configuration. This behavior is the direct consequence of the glycosidic bond geometry because changing this geometry from the α to the β configuration has a dramatic effect on the character of the chain trajectory. Relative to amylose, the cellulosic characteristic ratio of 100 is predicted to increase by twenty fold. This reflects the extended character of the cellulosic chains. Investigation of the effect of solvent on those two representative polysaccharides have also been attempted

. It was found, in good concordance with the experimentally observed solvent dependence, that significant changes in the unperturbed chain dimensions occur. The characteristic ratio for amylose is larger in water than in vacuum, whereas for cellulose it is smaller. Here again, the incorporation of solvation remains a difficult attempt, and these conclusions should be considered with caution.The following example provides an illustration of the an application of this procedure to the characterization of the solution behavior of pectic substances. Pectins are a family of polysaccharides that constitute a large portion of the cell wall of many higher plants where they influence growth, development and senescence. They are extensively used as gel formers and thickening agents in food industry. The basic backbone of pectin polysaccharide is formed by (1→4)-linked α-D-galacturopyranosyl residues, either free or in ester form. These homogalacturonan sequences may be interspersed at intervals with β-L-rhamnopyranosyl residues carrying the major part of neutral sugar side-chains, mainly arabinans, galactans or arabinogalactans. Three recent different molecular modeling studies

were carried out on theoretical polysaccharide chain models of the linear part of pectin molecules. All of them illustrate the important extended and stiff character of homogalacturonan sequences. As a result of different force fields and strategies used in the calculation of the potential energy surfaces of the parent disaccharides, the reported chain dimensions are very different. There is an excellent consistency between the calculated characteristic ratios of 57

and 47

that have been computed from relaxed MM3 and CHARMm maps, respectively. The characteristic ratio established from potential energy surfaces calculated with a "rigid residue approach"

is between 150 and 253, depending on the value of the glycosidic bond angle. The solvent polarity as well as the ionic state of the galacturonic acid residues affects the conformational behavior of the glycosidic linkage; these two leads to an increase of the unperturbed limiting chain dimensions. The insertion of rhamnose residues in the primary sequence

does not seriously disrupt the overall chain propagation as shown by a small decrease (8% with 25% of rhamnose) of the characteristic ratio. This result is in contrast with another study which concluded that the insertion of rhamnose units decreased the characteristic ratio by about 50%

. This discrepancy may be due to placement of the rhamnose units whose occurrence follows a defined pattern derived from experimental investigations or in a Bernoullian distribution.In the physicochemical analysis of pectins chains, the characterizations of both their size and shape were studied by many tectonics such as osmometry measurement, wide and low angle laser light scattering coupled or not with size exclusion chromatography, and viscometry, low speed sedimentation equilibrium and small angle neutron scattering

. The heterogeneity of the primary structure along with the presence of aggregates could affect the molecular state in solution and therefore hampers the accurate determination of molecular weight. This is why the literature on solution features of pectin is full of conflicting reports. Depending on the authors and the measuring techniques used, pectins molecules have been reported to behave as rigid-rod particles or as coils of variable stiffness. Some authors have reported that the stiffness of chains could depend on the degree of esterification. The neutral sugar content has been reported to affect the conformation and it has been suggested that fractions rich in neutral sugars are responsible for the high molecular weight found in some cases. A quantitative comparison between predicted shape and the measured one is complicated because the heterogeneity of the chemical structure make it difficult a straightforward comparison between experimental results. The joint use of small angle neutron scattering, viscosimetric and molecular modeling studies

on a series of samples having well characterized degrees of methylation and rhamnose contents provided a consistent characterization of the configurational features of pectins. More elaborate characterizations are still awaiting the availability of tailor-made samples of pectins.

The Ordered State of Polysaccharides

Like other polymers, polysaccharides form helical structures. The helix symmetry can be denoted by uv, which means that there are u repeat units in v turns of the helix . The helical arrangement can also be described in terms of a set of helical parameters (n,h); n is the number of repeating units per turn of the helix, and h is the translation along the helix axis. The chirality of the helix is described by the sign of h: a negative value designates a left-handed helix . An x-ray diffraction pattern from a highly crystalline sample gives through the positions of the reflections information about the unit cell dimensions and the space group. The atomic positions can be deduced from the intensities of the reflections. It is usually impossible to obtain highly crystalline samples of polysaccharides, which limits the quality of the diffraction pattern. The diffracted intensities are restricted to layer lines. The layer line spacings give information related to the axial advance h of the molecule. The meridian (vertical axis) of the diffraction pattern only has intensities on layer lines that are a multiple of u, so this gives information of the helical symmetry . With the information from the x-ray diffraction pattern and molecular modeling usually a number of possible structures can be calculated. These models include left- and right handed helices, and single and coaxial multiple helices. In the case of double helices, the strands can be either parallel or anti-parallel. At present, for some polysaccharides the number of strands in the helix is well known, but for others this is still under debate. The POLYS program , already mentioned at the end of the introduction, has been extended for the purpose of multiple helices. A single strand is positioned in such a way that the helix axis coincides with the z-axis of a coordinate system, after which a rotation or screw operation is applied. In this way double or triple helices are easily generated.Examples of better known single-helical polysaccharide structures include cellulose, 1→4 linked β-D-glucose units. Different polymorphs exist, and although the crystal structures are not yet known with atomic accuracy, it is known that cellulose in the predominant forms are single-stranded two-fold helices . Amylose exists of 1→4 linked α-D-glucose units and it crystallizes as double helices. Here too, different forms are known, which mainly differ in the water content of the unit cells . In both cases the amylose molecules form left-handed, six-fold helices of 2.1 nm (n = 6 and h = -0.35 nm). In the double helix the individual strands are oriented parallel. A triple helix is formed by beta (1→3) glucan . In this polysaccharide β-D-glucose units are linked 1→3. An illustration of the some of the examples given here is shown in Fig 9.

methods-molmod-fig9

Fig. 9 Examples of a single helix (cellulose), a double helix (amylose) and a triple helix (β-1→3-glucan).

The combination of molecular modeling and x-ray data for a linear homopolysaccharide is illustrated by amylose. According to the experimental data, the helices have six-fold symmetry repeating in 2.1 nm. In Fig 10 the potential energy surface calculated as a function of the glycosidic torsion angles of maltose (which is the repeating unit of amylose) is shown. The helical parameters of the amylosic strand generated for each combination of these torsion angles are superimposed on the surface. The conformations that are in agreement with the experimental data are found at the intersections of the contours n = 6 and h = 0.35 nm (for a right-handed chirality) or h = -0.35 nm (for a left-handed chirality). The conformations generating a left-handed helix are near the calculated energy minimum, whereas the alternative conformations appear to be unstable. Therefore, a left-handed model seems to be appropriate .

methods-molmod-fig10

Fig. 10 Iso-n and iso-h (in Å) countours superimposed on the potential energy surface of maltose. The iso-energy contours are drawn by interpolation of 1 kcal/mol with respect of the calculated minimum (*). The h = 0 contour divides the map in a right-handed (h > 0) and a left-handed (h < 0) region.

For some polysaccharides diffraction patterns have been measured that are not easily interpreted. Agarose, for example, is a linear polysaccharide consisting of alternating 3-linked β-D-galactose and 4-linked 3,6-anhydro-α-L-galactose units (see Fig 11). Three diffraction patterns have been reported that were interpreted as single helices . A fourth diffraction pattern, however, is still under debate. It might result from a double helix in which the individual strands are shifted half the pitch with respect to each other, but the validity of this structure is not widely accepted . Carrageenans form another class of polysaccharides of which the structure is not yet completely understood. Different forms exist, three of which are also schematically given in Fig 11. It is generally accepted that ι-carrageenan forms double helices in the ordered state. Although there is a great similarity between ι-carrageenan and κ-carrageenan, the structure of κ-carrageenan is still under debate. Both single and double-helical models are considered. For λ-carrageenan it is rather unlikely that a double helix is formed .

methods-molmod-fig11

Fig. 11 Schematic representation of the primary structure of agarose (top), iota- and kappa-carrageenan (middle; in the iota-form R = SO^3-, and in the kappa-form R = H), and lambda-carrageenan (bottom; R = H or SO^3-)

The next step in the determination of the structure of polysaccharides in the ordered state is the investigation of the interaction of different helices, the packing. In a crystal structure of a polymer the chains can be packed parallel or anti-parallel. In cellulose this packing is still one of the remaining questions for the different polymorphs; it is almost impossible to distinguish the two possibilities on the basis of the few reflections that can be measured by X-ray diffraction.

Molecular modeling becomes more and more a powerful tool in the study of the packing of polysaccharides. Models can be build and energies can be calculated. A method of calculating chain-chain interactions was published by Pérez et al. . Given a rigid model of an isolated (single or multiple) helix, its interaction with a second helix is calculated at varied helix axis translations and mutual rotational orientations while keeping the helices in van der Waals contact. For each setting, the energy is calculated and the inter-chain distance. No energy minimizations are performed (which is in such a study hardly applicable: it is very time consuming and leads only to the nearest minimum), but the energy and inter-chain distance are calculated on a three-dimensional grid. This is illustrated in Fig 12. For efficient packing not only a low energy is of importance. Coupled values of the rotations of the individual chains are maybe even more important (which indicates the presence of a rotation axis). When the translation between the helix axis is related to the repeat distance of the helix, the rotation axis might even be a screw axis.

methods-molmod-fig12

Fig. 12 Schematic representation of the chain pairing procedure.

This procedure has been applied successfully for amylose. Amylose is found as A-type (in cereal starches) or B-type (in tuber starches). In both types the same double helix is found. The chain-pairing procedure was applied to the left-handed double-helical model of this helix, and significant low-energy chain pairings were selected, both parallel and anti-parallel. The most promising was a parallel packing of two helices, where the mutual rotations appeared to be coupled and the translation was half the fiber repeat. The inter-chain distance was calculated to be 1.077 nm. This is in excellent agreement with crystallographic studies 4,5, only the inter-chain distance is slightly larger. In A-type starch it is 1.062 nm and in B-type 1.068 nm. Another, looser, type of interaction is seen in the crystal structure of A-type; this arrangement corresponds to a calculated secondary minimum which is among the low-energy chain pairings . The procedure has been applied to several other polysaccharides ; an interesting recent example of larger complexity is agarose. As mentioned before, different diffraction patterns exist. First, an extensive search was carried out to find (single and multiple, left- and right-handed) helices of low energy that were in agreement with the observed layer line spacings and helical symmetry. This is much more difficult than for a homopolymer since the helical parameters are now mainly depending on not two, but four glycosidic torsion angles. Thus, the method illustrated in Fig 10 can not be applied. For the models obtained the chain paring procedure was applied. This resulted in parallel and anti-parallel packings of a left- and a right-handed helix for each diffraction pattern. For one of the diffraction patterns the energies of these four models are given in Table 1. It was reported that this pattern resulted from a crystal structure in a trigonal crystal system . With this knowledge a crystal structure was built with the left-handed helical model in an anti-parallel orientation. Extra symmetry appeared to be present in our model, and the space group could be assigned . The asymmetric unit of the cell contains only one agarose repeat unit. Furthermore, the cell has about 30% solvent accessible space, which is 3 or 4 water molecules per agarose repeat. An impression of the structure is shown in Fig 13. The verification of this model comes through the comparison of the calculated and observed diffraction patterns, which is being done at present. A preliminary test is the comparison of the unit cell dimensions, and here the agreement is excellent. The length of the c axis is related to the layer line spacing, which we had used in building the helix, but the length of the a axis (in a trigonal crystal system equal to the b axis) follows directly from our calculated inter-chain distance. This axis was experimentally determined to be 1.024 (0.01) nm; in our calculations it is 1.04 (0.04) nm.

chirality	packing	Erel helix	Erel packing	Erel total
right handed	parallel	2.0	0.9	2.9
right handed	anti-parrallel	2.0	0.9	2.9
left handed	parallel	0.0	1.4	1.4
left handed	anti-parrallel	0.0	0.0	0.0

Table 1. Relative energies (in kcal/mol, per repeating unit) of the different packings of the different models.

methods-molmod-fig13a _ methods-molmod-fig13b

Fig. 13 Predicted crystal structure for one of the observed agarose diffraction patterns (left: top view; right: side view).

With the increasing speed of computers, more complex polysaccharide studies will become possible too. Xanthan, for example, and other charged polysaccharides shall soon be within the possibilities of computer modeling.

Conclusions

It was the aim of the present work to describe the most recent tools which have been developed for modeling the three-dimensional features of polysaccharides and carbohydrate polymers. It was shown that the primary structures of polysaccharides vary in composition, sequence, molecular weight, anomeric configuration, linkage position and charge density. As a consequence, an almost infinite array of chemical structures and conformations can be generated for polysaccharides. Additional variability also arises from environmental changes such as ionic strength and degree of hydration. In order to cope with such complex macromolecules, the integration of molecular modeling into the biophysics analysis of polysaccharides is required. Steady progresses have been made, which allow a description of the conformations of flexible rings as in the case of five-membered rings, a thorough description of the conformational space which is available for a disaccharide, either in vacuum or in an aqueous solution. Several force fields along with a dedicated parametrization are available. They are in principle capable of dealing with the specific stereoelectronic effects as the anomeric and exo-anomeric effects, or to cope with the enhanced complexity arising from the hydrogen bonding capacities of the carbohydrates. It has been clearly stated that probing in an unequivocal fashion these force fields may not be so straightforward. In particular the use of NMR observable can only be envisaged once the question of internal motions is fully understood.

One of the prerequisites for extending the modeling from disaccharides to polysaccharides implies that the rotations of a particular glycosidic linkage can be considered, under some conditions, to be independent of the nearest neighbors. Consequently, the conformations of a polysaccharide can be described conveniently by the glycosidic torsion angles from consecutive dimeric fragments. In solution, polysaccharide chains end to adopt less ordered structures (random coil) that fluctuate amongst different local and overall conformations. Proper modeling can provide insight into the dimensions of these random coils and such descriptors as persistence lengths or characteristic ratio can be readily assessed. Interestingly, the occurrence of local helical regions may be detected from such simulations. It has been observed that these locally ordered conformations are preceding the regular local helical arrangements which are found in the solid state. With the help of such descriptors as helical parameters, the ordered state of polysaccharide strands can be readily characterized. The generation of double or triple helices is then attempted in order to investigate the occurrence of such multistranded arrangements which may be energetically stable. The final step in the determination of the structure of polysaccharides in the ordered state is the investigation of the interactions of different helices which may lead to either the best arrangement(s) between two polymeric chains or to the prediction of the dimensions and the symmetry of a three dimensional lattice

The characterization of secondary, tertiary, and quaternary structural levels of organizations of polysaccharides is a prerequisite for understanding the molecular basis of their properties and/or functions and for controlling and manipulating these properties and/or functions through rational changes of molecular fine structure. Therefore, all the different steps which have been described above have been integrated in a general computer program which enables the prediction of the three-dimensional arrangements of polysaccharides chains from the knowledge of the primary structure. Figure 14 is a synoptic of such a program. It incorporates some features such as the prediction of low energy arrangements between different species of polymers chains which could be indicative in the formulation of blends involving polysaccharides. Further work is required, for establishing Quantitative Structure Properties Relationships, in the area of polysaccharides, with a particular emphasis on the gel forming and the viscoelastic properties. The wealth of information available in present data bases can certainly be rationalized. Some of the tools which have been developed should allow automatic searches for meaningful correlations between structures and functions through explorative data analysis. Structure-function or structure property correlations could be then used to model changes arising from structural alterations. This would open the field of polysaccharide engineering.

methods-molmod-fig14

Fig. 14 Flow chart describing the steps leading from the knowledge of the primary structure to the different structural levels of increasing complexity.

Other challenges will be along the field of modeling supra-molecular structures embedding complex assemblies of polysaccharides. Only realistic modeling of the microfibillar structure of cellulose chains will provide insight in the understanding of the unique rheological properties displayed by such native arrangements plentiful available in renewable raw materials. Other major polysaccharide architecture such as starch granule would also be investigated. Here, the challenge lies in putting together such elementary pieces as double-helices and branching points located at key points of amylose chains to construct the elementary clusters containing over several 100,000 atoms. Only such constructions will lead to the understanding of how the ratio of amylose over amylopectin controls the establishment of unique types of granule. In turns, the control of such ratio, via the routes of molecular biology, will create ad hoc type of granules, for which a significant range of properties can be foreseen. In the area of plant cell walls, the situation is even more intricate since several polysaccharides moieties in the form of a microfibrils and various complex polysaccharides interact via proteins. These are only selected examples which indicate some important structures and architectures as challenging candidates to model within the decade to come.

Methods of Polysaccharide Structure Determination