Nucleic Acids Research Advance Access originally published online on August 21, 2008
Nucleic Acids Research 2008 36(17):5482-5515; doi:10.1093/nar/gkn517
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2008, Vol. 36, No. 17 5482-5515
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Structural Biology |
Stability and kinetics of G-quadruplex structures
1Structural Biology Program and 2Molecular Targets Program, JG Brown Cancer Center, University of Louisville, KY 40202, USA
*To whom correspondence should be addressed. Tel: +1 502 8523067; Fax: +1 502 8524311; Email: anlane01{at}gwise.louisville.edu
Received June 11, 2008. Revised July 26, 2008. Accepted July 29, 2008.
| ABSTRACT |
|---|
|
|
|---|
In this review, we give an overview of recent literature on the structure and stability of unimolecular G-rich quadruplex structures that are relevant to drug design and for in vivo function. The unifying theme in this review is energetics. The thermodynamic stability of quadruplexes has not been studied in the same detail as DNA and RNA duplexes, and there are important differences in the balance of forces between these classes of folded oligonucleotides. We provide an overview of the principles of stability and where available the experimental data that report on these principles. Significant gaps in the literature have been identified, that should be filled by a systematic study of well-defined quadruplexes not only to provide the basic understanding of stability both for design purposes, but also as it relates to in vivo occurrence of quadruplexes. Techniques that are commonly applied to the determination of the structure, stability and folding are discussed in terms of information content and limitations. Quadruplex structures fold and unfold comparatively slowly, and DNA unwinding events associated with transcription and replication may be operating far from equilibrium. The kinetics of formation and resolution of quadruplexes, and methodologies are discussed in the context of stability and their possible biological occurrence.
| INTRODUCTION |
|---|
|
|
|---|
G-quadruplexes of repeat sequences of the kind AGnTm spontaneously fold into stable compact structures in solution, especially in the presence of K+. The resulting structures are compact, resistant to DNAses, generally have high melting temperatures, and appear to be dominated by the presence of the so-called G-quartet stacks (Figure 1). Such sequences are found in telomeres, and at a surprisingly high frequency in other parts of the genome, especially in promoters (1,2). There is now an immense literature on both the biology and physical properties of such sequences (3–6). The literature through the mid-1990s has been reviewed in a book (7). It is now believed by some that G-quadruplex oligonucleotide structures are important biological regulators, both in DNA and RNA (3–6,8–17).
|
There have been recent reviews of G-quadruplexes, focusing mainly on the structural aspects of observed quartets (18–24) or on telomerase biology (25).
The wider interest in such structures has been highlighted by numerous sessions in international conferences (cf. ACS Pacifichem 2005 Chemical Congress, Honolulu, Hawaii, USA, and recently a 2.5 day conference First International Meeting on DNA Quadruplex DNA held in Louisville April 2007, devoted entirely to G-quadruplex DNA). This meeting covered a wide area of topics, and was widely reported [(13); http://pubs.acs.org/cen/coverstory/85/8522cover.html]. However, that meeting did not focus on or address the problems of stability and kinetic control of the possible structures, even though this may be of great biological relevance. It has been reported that there are 26 possible topologies of G-quadruplexes, yet only a few (6) have been observed in vitro (19), raising the question of what determines the stability, whether kinetic or thermodynamic, of allowable structures? Although there is a significant literature on the stability and kinetics of quadruplex structures (26,27), the relationships between the observed stability, kinetics and structures have not been addressed recently.
The physical chemistry of G-quadruplexes is complex and fascinating. Despite a large body of published work on structure and other properties, our understanding of their basic physical properties is rather limited. Of the more than 1300 papers mentioning G-quadruplexes since the late 1980s, a modest fraction has been devoted to their physical properties. This includes more than 90 structures that have been deposited in the protein data bank (June 2008).
Even for short sequences comprising 3–4 G-quartets, it is not known what determines their structures in terms of sequence space, experimental conditions, thermodynamics and kinetics. In part, this may be attributed to the ad hoc and piecemeal individual approaches to the problem, which is sufficiently complex that it requires the concerted efforts of teams of researchers having complementary skills to analyze systematically a wide range of properties on the same set of systems, using agreed upon parameter variation. This point will be taken up further in the Discussion section.
In this review, we focus on the stability and kinetics of intramolecular quadruplexes by mining the literature for information on the stability and dynamics of such structures, the multiple conformations that are routinely observed (28–30) highlighting the empirical difficulties and any problems with design and analysis. In particular, we attempt to address the following questions regarding quadruplex formation that are directly amenable to experimental and computational methods:
- What are the possible structures of G-quadruplexes?
- What are the relative stabilities of such structures?
- What are the likely forces (energies) that are responsible for their stability?
- What determines whether these structures form in vitro?
- What are the possible consequences of G-quadruplex formation?
- How might in vitro understanding inform about the cellular context?
Here, we will focus mainly on the unimolecular (foldback) structures for practical reasons, namely these are the ones studied in greatest detail, are the most likely forms to exist in vivo and because the physical chemistry is much easier to analyze (structures are independent of concentration, faster and concentration independent kinetics, reversible concentration-independent thermodynamics and easier to determine detailed conformations).
The implications of these features for biological activity and design will be addressed. Finally, we propose that a consortium be established to analyze this problem systematically, using accepted standards of experimental design, and suggest the guidelines for establishing such a consortium.
| QUADRUPLEX TOPOLOGIES AND STRUCTURES |
|---|
|
|
|---|
There have been several excellent reviews of G-quadruplex structures published recently (18–24). J. L. Huppert maintains a website http://www.quadruplex.org/?view=quadbase including information and access to an algorithm for locating putative quadruplex sequences (PQS) in genomic data.
We provide a brief overview here for the purpose of the subsequent discussion of their properties in solution.
G-quartets are based on the formation of a (nearly) square planar array of four guanine bases, as shown in Figure 1A and B. Although the structure appears to be stabilized by a hydrogen-bonding network involving N7:N2H and O6:N1H, this is unlikely to be the source of the thermodynamic stability of such structures in the solution state (see Thermodynamics and kinetics section). Indeed, the central core of the G-quartet produces a specific geometric arrangement of lone pairs of electrons from the four GO6, which can coordinate a monovalent ion of the correct size, such as Na+ or K+. Generally, these structures do not form in the absence of such ions. The smaller Na+ ion can sit in the plane formed by these atoms, whereas the larger K+ requires a nonplanar component, which may in fact lie between two such G-quartets, as shown in Figure 2. In fact, this allows additional coordination of the metal ions, i.e. to satisfy the usual hexacoordinate stereochemistry of the alkali metal ions. In order to accommodate this stereochemistry, the individual nucleobases may dome out of the plane somewhat (31), to an extent balanced by the stacking energies (see Thermodynamics and kinetics section for more detail).
|
Indeed, a feature of G-quadruplex structures is that they comprise a stack of two or more G-quartets, [or tetrads for those who prefer Greek roots (OED), (7)], linked by the phosphodiester backbone and stabilized by specific monovalent ion binding. In the context of a unimolecular structure, the organization of the chain direction (reading 5' to 3') gives rise to a large number of possible topologies.
Figure 3 displays some basic topologies. These topologies impose certain constraints in local structures including the syn/anticonformation about the glycosyl bond of the quartet quanines (Figure 1).
|
Overview of possible and actual structures
Even within the context of a small sequence space, the possible structural diversity of folding topologies of quadruplex structures is high (32,33). It has been reported that the total number of possible topologies is 26 different folds for molecules that comprise three loops with contiguous G-quartet strands (33). However, this does not take into account the recently reported unexpected fold of the c-kit promoter quadruplex structure (34) in which the strands contributing to the G-quartets are not contiguous. Of the original 26 folds, only six have been experimentally determined, namely the all parallel double chain reversal loops [dA(GGGTTA)3GGG: K+ form] (35), all lateral loops d(GGTTGGTGTGGTTGG) (36), lateral, lateral, double chain reversal loops d(GGGCGCGGGAGGAATTGGGCGGG) (37), double chain reversal, lateral, lateral loops d(TTA(GGGTTA)3GGGA) (38), lateral, diagonal, lateral loops dA(GGGTTA)3GGG (39), diagonal, double chain reversal, diagonal loops d(GGTTTTGGCAGGGTTTTGGT) (40) (Figure 3) (33).
Of the 96 structures (as of May 2008), deposited in the protein databank many of them are actually similar, with the same sequences in different environments. In the case of the human telomere repeat, d(GGGTTA)n, there are 208 possible structures when the eight possible quartet orientation combinations are considered with the 26 possible folds. Experimentally, only four actual structures have been determined for the human telomere repeat; the other two topologies that have been determined were for different sequences. These are the original NMR derived basket fold from Patel [dA(GGGTTA)3GGG:Na+ form (39)], the all-parallel (double chain reversal) loop crystal structure derived fold from Parkinson and Neidle [dA(GGGTTA)3GGG: K+ form (35), the NMR-derived hybrid 1 (dAAA(GGGTTA)3GGGAA: K+ form (41)] and the NMR-derived hybrid 2 (dTTA(GGGTTA)3GGGTT: K+ form (42) folds reported by Yang and Patel (38,43) as shown in Figure 3. These represent the remarkable diversity in single-chain topology. The dA(GGGTTA)3GGG:Na+ form has loops of lateral, diagonal and lateral. The hybrid 1 has loops from the 5' end of the double chain reversal, lateral and lateral, whereas hybrid 2 has lateral, lateral and double chain reversal loops. These represent the remarkable diversity in single-chain topology, and raise three important questions. First, how is the topology affected by the coordinating cation; second, how do the loop regions determine the topological fold; third, why have so few been observed in the laboratory—is it a lack of sequence space coverage, thermodynamics or kinetics; and fourth, to what extent is the environment, crystalline or otherwise, a suitable biological mimic? As Figure 3 shows, the different topologies vary greatly not only in the planarity and stacking of the bases in the quartets, but also in the disposition of bases in the connecting loops.
We will attempt to shed light on some of these questions in the following sections. The human telomere studies have attempted to focus on structures that may be biologically relevant. However, in all of these studies, including others on promoter regions, the sequences that have been used are short single quadruplex-forming sequences out of context with respect to the long-flanking (duplex) sequences. In most cases, single or multiple flanking bases have been added, that may or may not be part of the natural flanking base sequences, so as to obtain a single species that can be examined by NMR or X-ray crystallography. The question as to whether these modifications force the topology into something different from the original context is unknown. In reality, there may be several topologies present, or other interactions such as quadruplex-binding proteins that stabilize a particular fold and are absent present in the structural studies. However, much of this information is not available so cannot be expected to be included in the current structural studies. These considerations are discussed in greater detail below.
Why is it so difficult to force a distinct topology?
The major problem with designing sequences for specific topologies is that the loops of 1–3 bases can easily span the distance needed for a lateral or double-chain reversal loop for up to a four G-quartet stack. Typically, loops of 3–4 bases are needed for a diagonal. In the case of the human telomere repeat d(TTAGGG)n, this makes all 26 topologies theoretically accessible.
It should be noted for the all parallel high resolution (0.95 Å) crystal structure (44) d(TGGGGT)4 has the following distances: O3' top stack to O5' bottom stack for three G-quartet stacks is 7.3–7.9 Å: O3' top stack to O5' bottom stack for four G-quartet stacks is 11.1–12.2 Å, O5' to O5' of the adjacent same stack 14–15 Å, O3' to O3' of adjacent same stack 15–16 Å, O5' to O5' of the opposite same stack 19–21 Å, O3' to O3' of opposite same stack 20–22 Å (Table 1). Furthermore, the topology-dependent groove widths (Figure 1) give rise to different electronic distributions from the negatively charged phosphates. This is shown in Figure 4, which shows the space filling models colored by electrostatic potential calculated for 150 mM K+ using the Poisson–Boltzmann program APBS (45,46). These three representative structures demonstrate that shape, electrostatics and topology are intimately linked. The double chain reversal propeller structure (Figure 3, bottom) is a flat, plate like object compared to the other topologies, which appear more globular in shape. Indeed, the propeller structure stands out among all the topologies so far solved experimentally in terms of its overall shape, groove structures and electrostatic potential, suggesting that it should have physical properties that are readily distinguishable from all of the other folds as a group (not shown). Using the APBS program, we have calculated the electrostatic energy of different conformations of the human telomere sequence d(GGGTTAGGGTTAGGGTTAGGG). The parallel form was calculated to be 2769 kcal mol–1, hybrid 1 2904 kcal mol–1, hybrid 2 2925 kcal mol–1 and basket form 2867 kcal mol–1. In order to compare the same number of nucleotides and thus atoms, the flanking bases were truncated. This shows that the electrostatic energy (essentially due too the unfavorable interactions between the closely spaced phosphodiesters) is high, and differs by up to 156 kcal mol–1 for these three conformations. For comparison the energy of an unfolded strand that was subjected to molecular dynamics and then energy minimized was 1842 kcal mol–1. Although this represents only one possible instance of an ensemble of conformations, it is expanded with respect to the folded conformations, and shows a much lower unfavorable electrostatic energy, as expected. These values imply that the folding has to overcome a rather large electrostatic energy that must be compensated by other forces. In part, this is likely to arise from ion condensation and specific ion binding, as discussed in more detail below.
|
|
The conclusion is that although these are pioneering studies, there are insufficient numbers of different structures even to begin to elucidate what the rules are for obtaining the different folds. This has important consequences as techniques such as circular dichroism (CD), that are being used to examine quadruplex folds, are not definitive due to the paucity of different structures from which conclusions have been drawn (47). Some of these conclusions may well be true, but we do not currently have the data to support them fully. This point will be explored in more detail below.
Methods of determining structures
There are many approaches, both direct and indirect, to determining conformations of macromolecules at various levels of resolution. Although high-resolution structures (i.e. at or near atomic resolution) are desirable for discussing and designing experiments at the molecular level, in some instances low-resolution information can be sufficient for a particular problem, such as for fold or topology determination, or simply for quality control, i.e. verification that a particular structure is present and the purity of the structure. Here, we give a brief overview of the techniques commonly used for nucleic acids (NAs) conformational analysis, with an emphasis on the advantages and pitfalls in the quadruplex arena.
Atomic resolution
There are three main methodologies in use to assess the 3D structures of quadruplexes at atomic resolution: single crystal diffraction, which has to date provided more than 50 structures including those with bound ligands, high-resolution solution state NMR (30 structures) (24), and molecular modeling with relaxation. The highest resolution structures are obtained by X-ray diffraction, and in one case there is a sub-Å resolution structure that is a valuable resource for detailed analysis of bonding (44). NMR produces significantly poorer definition structures than crystallography. Some of this is real (dynamics, exchanging conformations) and some of it reflects the limitations of the methodology (18,19,48,49). Crystallography produces the structure of the form that actually crystallizes under the given conditions, whereas other methods in principle measure the broader ensemble properties. Nevertheless, for both NMR and X-ray studies, it is the norm to manipulate the sequences to improve the yield of crystals or to reduce the number of competing conformations that are present (and that thus interfere with, e.g. spectral and structural analysis). Generally speaking, published NMR structures have been obtained only after considerable sequence manipulations, and/or in the presence of a fairly substantial background of aggregated material (cf. higher order structures) as well as alternative conformations, which often are present at the 10% or greater level (17,28,49,50). The significance of this observation will be taken up later.
Although de novo modeling is at the mercy of the quality of force fields and how to deal with electrostatics (specific and nonspecific) (51–53), modeling does not have to worry about alternative conformations per se, and has the advantage that it deals with individual energy terms, i.e. allows parsing of terms that are not accessible to experiment (51–57). Advances in the quality of the force fields used have improved the overall quality of the calculations, though the loops remain particularly problematic (52,53,56–58). However, when combined with, for example biophysical data, reliable and valuable models of complex structures can be generated (59).
Spectroscopy
The electronic spectroscopies have long been used to characterize the structures of quadruplexes (60), as well as provide a convenient sensitive signal for monitoring transitions or ligand binding. The latter is uncontroversial. CD spectra are routinely used, along with electrophoresis, to assign folds (29,59–65). However, the interpretation of optical properties such as hypochromicity or the shape and sign of CD bands (cf. Figure 5) is controversial (47). Although the CD spectra of A-, B- and Z-DNA are quite different and have been backed by theoretical calculations (66–69), the situation with G-quadruplexes is much less clear. An empirical study that has been much cited showed CD spectra of different structures. However, the authors pointed out that there was no simple relationship between fold and shape of the CD spectrum (47). Although there have been ab initio calculations of the CD of proteins and NAs duplexes (69,70), there has been little work on the theoretical analysis of quadruplex CD. Calculations for an antiparallel DNA d(G4T4)4 (71) showed an essentially conservative CD spectrum in the range 220–320 nm with a maximum at 260 nm and a minimum at 245 nm (zero crossing at 250 nm) that only slightly resembled the experimental spectrum (whose structures were not independently verified). More recently, Gray et al. (72) have carried out calculations for two stacked quartets in which the quartets have the same or opposite polarities for hydrogen bonding (i.e. clockwise or anticlockwise cf. Figure 1). These quartets can stack such that they both have the same polarity or opposite polarity, and specifically the rotation angle between the stacks which gives rise to quite different stacking interactions, which is a major determinant of the intensity and shape of the CD spectrum, and specifically the rotation angle between the stacks. The calculated CD spectra of these two simple states are quite different. The same polarity stacks show a minimum at 235 nm, a maximum at 260 nm (zero crossing at 250 nm) and a second, broad positive band at
295 nm (similar to the spectrum often attributed to the parallel conformation). The opposite polarity stacks however gave an inverted spectrum with a quasi conservative spectrum having a minimum at 265 nm, a maximum at 295 nm and a zero crossing at
280 nm (often attributed to the antiparallel conformation (cf. Figure 5B). As the authors pointed out, the intensities of the calculated spectra seem to be substantially in error. It is our contention that until either accurate calculations can be done, in which the influence of quartet rotation, additional induced CD from looped bases are systematically accounted for or a rigorous empirical database can be generated, in which the CD spectra of quadruplex samples whose structure has been unequivocally determined on that sample under the same conditions, the interpretation of CD in structural terms is unwise as it amounts to a circular argument.
|
Use of fluorophores
The 2-aminopurine is a fluorescent base that is a common replacement for A or G and is relatively unperturbing (except in a G-quartet in this instance). This base can be incorporated during the synthesis of an oligonucleotide, and be used to report on events in or near loops for example, or as a measure of solvent exposure (61) by emission properties and the influence of externally titrated quenchers (Stern–Volmer analysis). Other fluorophores can be incorporated in loops or at the free 5' and 3' ends. The latter may permit more flexibility in what can be used, although aromatic groups at the ends of quadruplexes can end paste and stabilize the structure (18,28,73).
The fluorescent properties can be used to monitor unfolding transitions, either as the quantum yield changes, shift in emission maximum or often most reliably by changes in anisotropy, with due care from the influence of temperature on the fluorescent properties. If a donor and acceptor pair can be introduced at different positions, the fluorescence resonance energy transfer (FRET) efficiency can be monitored as a function of temperature to probe thermodynamic and kinetic stability (unfolding should decrease the FRET) (74–76), though stabilization by end-pasting for example needs to be considered (18,28,73). FRET may also be used as a poor man's structural probe, i.e. by measuring several pairwise distances. In general, one would label the folded oligonucleotide to avoid influencing the final structure during refolding. This could be achieved using labels containing the maleimido group, which reacts with phosphorothioate groups that can be incorporated at any desired DNA backbone position.
This requires careful choice of donor/acceptor pairs, and due care needs to be taken to account of the orientation factor
2. The use of 2/3 (complete orientational randomization of the emission dipole over the fluorescence lifetime) without independent evidence is not recommended. However, the fluorescence anisotropy can provide limits on the values of
2, as described in detail many years ago (77,78). Distances between ends of various structures were provided in Figure 1 for small (three-stack) quadruplexes. FRET would have to be able to discriminate reliably between distances close to 10 Å and distances close to 20 Å. This would necessitate choosing FRET pairs with R0-values close to these distances. As the recovered distance R is R0(1 – E/E)1/6 the error in R is determined mainly by errors in R0, i.e. in (
2)1/6. Although 0 <
2 < 4 using a value of 2/3 means the maximum error in the upper limit is 34% (for
2 = 4). If anisotropy measurements were able to limit
2 to say 0.1–3 for example, the error in the distance would be <30%, which might be adequate to discriminate between some folds. Single molecule FRET has been used recently to estimate the distribution of conformational states in the human telomere sequence (30).
However, the lower end is short for FRET. A similar approach that is active over this distance range is electron spin resonance (ESR), in which the dipole–dipole interaction between two spin labels can be measured, as has been done extensively on rhodopsin for example (79,80) and more recently NAs (81).
NMR can also be used spectroscopically to assess quadruplex formation simply by measuring the exchangeable protons in the 10–12 p.p.m. range. For well-behaved, small complexes, there GN1H protons and the GNH2 protons can be counted, aided by the difference in the spectra between H2O and D2O. In G-quadruplexes, the exchange of imino protons with solvent is exceedingly slow (49) and takes days to exchange for deuterium in D2O solvent (49). This is quite unlike DNA duplexes or triple helices, where the imino protons typically exchange in seconds or minutes (49,82). The extreme kinetic stability of the imino protons is associated with low amplitude fluctuation of the quartets, which do not permit access of water or base, and further correlated with the very high thermodynamic and kinetic stability of the quadruplex structure as a whole (see below).
Patel's group and others have pioneered the use of low-level 15N enrichment in the G (83) and which has been successfully used by other groups (17). Each G is systematically substituted with a 15N-labeled nucleotide at a few percentage enrichment (the natural abundance of 15N is 0.37%). As 15N has a spin 1/2, it causes a predictable splitting of the attached N1H hydrogen due to the one bond scalar coupling. This coupling can be exploited to edit a 1H spectrum, so that only the imino proton attached to 15N (rather than 14N) is detected. This of course can provide an unequivocal assignment, as well as a count of the GN1H that are involved in hydrogen-bonding structure, and whether they are in fact in a unique environment.
H-bond donors and acceptors can also be determined with such a labeled system. This is because in NAs where H-bonding involves N–H:::N interactions, the covalent character of the H-bonds contains a scalar coupling interaction between the donor and acceptor N, which is of the order of a few hertz in Watson–Crick and Hoogsteen bases pairs, and can this be readily detected as a splitting in the NMR spectrum. This has been used to great advantage in DNA duplexes, triplexes and quadruplexes (84–91).
Hydrodynamics
Hydrodynamic techniques such as sedimentation velocity and translational diffusion measurements, as well as those techniques that supply information about the rotational diffusion (e.g. NMR) give information about molecular size/shape and hydration. For simple bodies where the departure from spherical symmetry is modest, the frictional properties can be described by (66):
|
| (1) |
is the solvent viscosity ah is the hydrated radius of the particle, V is the hydrated volume and Ft and Fr are asymmetry parameters that are unity for a sphere. Analytical and semi-analytical expressions exist for the dependence of F on the axial ratio for ellipsoids of revolution and cylinders (66,92–96). Thus, translational diffusion coefficients scale as the inverse of the linear dimension (cube root of mass or volume), whereas rotational diffusion scales as the inverse cube of the linear dimension. As the asymmetry increases, e.g. in the formation of ellipsoid of rotation or cylindrical symmetry, Frot deviates more quickly from unity than Ft as the axial ratio increases (66). Although the number of parameters that can be determined is small, and they relate to global properties, it is now possible to measure hydrodynamic parameters with very high precision, even in presence of a distribution of species. Both dynamic light scattering (DLS) and sedimentation velocity experiments provide an estimate of the most probable frictional coefficient, the effective width of the particle distribution, as well as the fraction of species of significantly different size/shape. NMR, and under appropriate circumstances, fluorescence anisotropy can provide complementary rotational friction data, which when combined offer a rather critical test of the correctness of a proposed structure. This is because hydrodynamic parameters can be calculated with reasonable accuracy using bead models (97,98). If the coordinates of proposed structure are available (or of a family of structures), then the hydrodynamic properties can be calculated, and compared with the experimental values. Those models that lie well outside of the experimental values, within the limitations of the model approach itself, can be rejected. This general approach, in conjunction with other spectroscopic data, was used to demonstrate that the widely used X-ray crystal structure of the human telomere (99) is not the dominant form in free dilute solution (61).
The same approach can be used also to calculate other hydrodynamic and mechanical properties of macromolecules, including rotational diffusion constants and mass distributions such as the radius of gyration. Rotational diffusion [cf. Equation (1)] is in general more sensitive to size and shape than translation diffusion, but is often more difficult to measure. In order to compare observed and calculated hydrodynamic properties, it is necessary to know the partial specific volume (psv) of the particle under the conditions of interest, as well as the effect of a hydration layer on the frictional properties. Whereas for proteins, the simple weighted sum of tabulated psv for the constituent amino acids generally is sufficient for calculation of psv (100), there is no such approach that is accurate for NAs structures, so the value is either assumed based on a small number of published measurements for different NAs (66,101), or can be measured. For duplex DNA, the psv is typically in the range 0.55–0.58 ml/g in 100 mM KCl (101). However, the psv for quadruplex structure is not accurately known, but could be measured either from density increments (102) or by sedimentation in two or more solvents of different density [e.g. H2O versus D2O (101)]. The latter method is of lower accuracy. For a psv of
0.6 ml/g, an error of 1% produces an error of around 1.5% in the buoyancy terms (1 = psv.
). A second source of uncertainty is how to treat the hydration layer. This is discussed in detail in the publications on the programs for calculating frictional properties (97,98,103). In effect, it seems to amount to adding a monolayer of solvent to the anhydrous particle, or an increase in the radius of the diameter of an electrostricted water molecule (
2.5–2.8 Å) (103–105). In practice, this is achieved by varying the bead size in the calculations. Clearly, such calculations need to be carefully calibrated for distinguishing small variations within a series of related structures. Additional size and shape information can be obtained by static light scattering or SAXS, such as the radius of gyration, maximum dimension and full-shape analysis using the entire scattering curve (106,107), which can reduce many of the uncertainties at a global level of analysis. Despite these limitations, hydrodynamic methods could become a very valuable means for rapidly assessing particular models and for quality control of quadruplexes to be used in any study for which the conformation(s) needs to be known.
Electrophoresis
Electrophoretic mobility is commonly used to assess the number of states present and the kinds of folded structures that may be present (62). Unlike duplex DNA, where the mobility essentially tracks according to the number of base pairs, at least for moderate length oligonucleotides, such that the size can be estimated by comparison with a ladder of known lengths, this is not true of small quadruplexes, which have a more compact structure. Electrophoretic mobility is partly a hydrodynamic phenomenon, and it also depends on the net effective charge of the molecule and the nature of the gel which determines the frictional resistance to motion (108,109). However, for DNA and RNA duplexes, as the net charge scales with the number of base pairs, calibration is straightforward, unless there are deviations from the rigid cylinder such as in for A-tract structures. As the interest here is the degree of curvature, this requires a very different calibration (110,111). Small quadruplexes do have a high formal negative charge, but many are also relatively squat (globular) compared with DNA duplexes or a DNA triplexes, making the shape and net charge distribution more difficult to estimate (but see Figure 4 for comparison of shapes and charge distributions). Furthermore, the degree of ion condensation (112,113) for such structures may be small (112–116), so that the effective charge-independent of Debye–Hückel screening may be relatively high compared with a duplex form, where the condensed fraction leaves a net charge of around 0.24 e– per phosphate (109,112,113,117). The clear exception is the propeller structure (Figure 4) which is more asymmetric, and thus the degree of ion condensation for this structure may well differ significantly from the other structures. This is considered in greater detail in Thermodynamics and kinetics section. It is notable that the structures shown in Figures 3 and 4 are mostly rather compact and appear roughly spherical, whereas the parallel propeller structure appears as a plate, and thus would impart considerably greater hydrodynamic drag than the antiparallel and mixed hybrid structures (61). Further, the electrostatic potential energy profile of these structures varies markedly, suggesting that the effective charge could be substantially different for these structures. The electrophoretic mobility under a single set of conditions is therefore not necessarily a reliable indicator of size per se, one of the things for which it is actually routinely used (59,60,62,64). Again, as for CD, due caution should be used without a reliable set of mobility markers whose structures have been verified for those particular samples.
Chemical modification
Inosine or 7-deaza-dG (Figure 1) substitution for G is expected to disrupt hydrogen bonding and therefore the stability of quadruplexes (118,119). Inosine will also decrease the H-bonding in Watson–Crick GC base pairs, whereas 7-deazadG will not. However, the rather different electronic structure of 7-deazadG indicates that H-bonding disruption can be only part of the story (57), which is in part why several substitutions have to be made to disrupt the structure (56,57,118,120–123). Although one intramolecular H-bond is lost by substitution there are also changes in the numbers of water molecules that are involved, indicating that there may be a significant entropic component (124). The effect of single or double inosine substitutions at various positions in a 22 nt the human telomeric sequence was studied in both K+ and Na+ buffers (119). For the single substitutions, the loss of stabilization free energy at 310 K ranged from 2.2 to 3.1 kcal mol–1 in K+ buffer, and from 1.4 to 2.6 kcal mol–1 in Na+ buffer with a difference on average of 0.5–2.5 kcal mol–1 for the two forms (albeit at different total concentration of cation). As the apparent enthalpy change also decreased by the substitutions, clearly interactions other than H-bonding are affected.
DMS footprinting is a commonly used technique for detecting bases involved in G-quartets [(118,125,126) others]. The N7 can be methylated if it is accessible to solvent and not involved in intramolecular hydrogen bonds, as in the classical G-quartet (Figure 1). The interpretation of the rate constant for modification however relies on assumptions about the mechanism of the reaction and what determines the chemical reactivity. This requires very careful calibration against authentic structures. As such, its main use is corroboration, in conjunction with other low-resolution techniques.
| THERMODYNAMICS AND KINETICS |
|---|
|
|
|---|
It is often stated that G-quadruplex structures are unusually stable, but rarely is it said with respect to what. In fact, the intramolecular quadruplexes are not thermodynamically more stable than some other intramolecular NA folds. For example, the stability of DNA duplexes of the same number of nucleotides, at 1 M strand concentrations, is of comparable stability as intramolecular quadruplexes containing three quartets (127) (and see below). Similarly,
G(310) of stabilization of short RNA hairpins (17 nt, loop size >5) in 100 mM salt is typically >5 kcal mol–1, (128) and much higher where the loop is of the type GNRA or UUCG (129,130). As will be shown below, for intramolecular G-quadruplex folds of
22 nt, the
G(310) of stabilization under similar salt conditions is comparable to or lower than of intramolecular DNA duplexes. However, the unfolding kinetics are very slow compared with DNA or RNA hairpin kinetics (see below).
The basic thermodynamic relationships that are relevant to quadruplex stability can be summarized as:
|
| (2) |
|
| (3) |
|
| (4) |
G is the Gibbs free-energy change, T is the absolute temperature
H and
S are the enthalpy and entropy changes, respectively. For practical purposes,
G is equivalent to
A, the Helmholtz free-energy change, in condensed media.
Cp is the change in heat capacity, which can be seen to be a more fundamental quantity than
H or
S. Equations (2–4) show how the experimentally accessible thermodynamic parameters depend on temperature.
H and
S are not independent, as measurement of
G, usually from an equilibrium constant [Equation (2)] and enthalpy (e.g. by calorimetry) automatically assigns
S. Furthermore, from Equations (3 and 4) the temperature dependence of
G is
G0 +
Cp (T – T0) –T
Cpln(T/T0), where the superscript 0 refers to a reference temperature. A nonzero Cp implies that
H and
S respond differently to temperature. In the context of G-quadruplex unfolding, it is observed that
H is positive at T = Tm, which means that
S is also positive at this temperature. If
Cp for unfolding is also positive, as is expected for denaturation in general, then
H decreases with decreasing temperature, leading to conclusion that at some sufficiently low temperature, the enthalpy change becomes zero, and also the concept of a temperature of maximum thermal stability, i.e. both cold and hot denaturation. This will be considered further in the section on measuring thermodynamic parameters, below.
The parameters may also depend on other extrinsic variables such as pressure, dielectric constant, ionic strength, etc. Variation of these latter parameters can provide information about additional molecular properties of the system, and indirectly about the forces involved in stability. For intramolecular quadruplexes (single strand), the parameters are independent of oligonucleotide concentration, whereas for multiple strands, the entropy (and thus
G) does depend on the deviation of the strand concentration from that in the chosen standard state, which must therefore be specified carefully.
Thermodynamic methods
Temperature variation
By far, the commonest thermodynamic variable used for characterizing NAs is temperature. If there is a difference in some signal S between the folded and unfolded states, varying the temperature will allow measurement of the transition between these states. For a simple two-state transition, F
U, the population of the states pf and pu will be related to the equilibrium constant as
|
| (5) |
H between the states is independent of temperature, the equilibrium constant is simply K(T) = K(ref)exp[
H/R(1/Tref – 1/T)]. Tref is an (arbitrary) reference temperature and K(ref) is the equilibrium constant at that temperature. For a pure two-state transition, the value of
H is the thermodynamic enthalpy difference, and is also known as the van't Hoff enthalpy. This can be derived by calculating K(T) from the melting curves (provided that proper upper and lower boundaries can be obtained), and from a plot of ln(K) versus 1/T, the slope is
H/R. Frequently, the observed melting curves are not so simple, in part because the optical parameters [e.g. absorbance, CD or fluorescence (74)] that are used to monitor the transition are themselves temperature dependent, or the transition is not two-state, which can give rise to baselines that are not flat (see below). This is difficult to distinguish from temperature effects on the molecular system itself, such as low enthalpy transitions at lower temperature, or because
H is not actually independent of temperature [cf. Equation (3)]. In fact, there is no good reason a priori to believe that the heat capacity of such systems is the same in the folded and unfolded states, nor that the heat capacity in these states is independent of temperature (117,131). This is a complication to which we will return. It is common therefore to use the temperature at which the transition is 50% complete, i.e. Tm (the melting temperature). The Tm value has the advantage that it is the most precisely determined melting parameter (see below). It is usually determined from the derivative of the transition curve with respect to temperature, which can under some circumstances lead to an error such as when the cooperativity is low (132,133). However, it is often inappropriately used as a surrogate for thermodynamic stability. Where comparisons are to be made between systems, the relevant thermodynamic parameter is
G at a chosen reference temperature, such as 298 K or 310 K. For an intramolecular system, the relationship between the thermodynamic parameters is simple [Equations (2–4)]. At T = Tm K = 1 and thus
G = 0.
H can also be determined by curve fitting. It is easy to calculate
G at any other temperature, i.e. as
|
| (6) |
Cp
0, the correction becomes:
G(T) =
H· T(1/Tm – 1/T) [
H0 +
Cp(T – Tm)]
Equation (6) shows that in fact the free-energy change at the desired temperature is proportional to
H and to the increase of Tm. Tm thus is related to
H and
G(310) as Tm = 310.
H/[
H –
G(310)]. It also shows that Tm is not a linear function of
H, although over a sufficiently narrow range it may appear so (62). This in fact demonstrates the simpler thermodynamics of a unimolecular system, where Tm =
H/
S. However, measurements of Tm are useful for additional thermodynamic analysis, as the variation with respect to salt concentration, water activity, etc. can be analyzed (see below).
It should be noted that
H obtained from a van't Hoff analysis is not always equal to the true (calorimetric)
H in these systems (134), and can be in error by more than a factor of 2. Clearly, any attempt to establish a common reference temperature under these conditions is suspect. This is related to a combination of the presence of multiple conformations, and the slow folding kinetics which can lead to technical difficulties in measuring equilibrium thermodynamic quantities, as described below.
Thermal denaturation monitored by spectroscopic methods
Spectroscopic methods for monitoring melting rely on there being a distinct difference in spectroscopic properties between the folded and unfolded states, and that there is (preferably) a linear dependence of the signal on concentration, i.e. the Beer–Lambert law holds, i.e. S =
.c, where c is the concentration and
is the specific spectroscopic response, such as an absorption coefficient. This is not obeyed when there are aggregation events for example, or for instrumental reasons (e.g. stray light). The latter does not affect NMR however.
Thermal denaturation (melting) of G-quadruplex structures is accompanied by distinctive changes in UV absorbance or circular dichroic spectra, as shown in Figure 5. These changes provide a convenient window for monitoring denaturation, and entry into the thermodynamics of the denaturation process. Several reviews that describe these methods and the subsequent analysis of the data have appeared (135–139). Labeling of G-quadruplex forming oligonucleotides with either fluorescent base analogs or with suitable acceptor–donor FRET pairs allows monitoring the denaturation process by fluorescence spectroscopy, with greatly improved sensitivity (139) (and see Quadruplex topologies and structures section above). Because of the availability of these excellent reviews, we will not review these methods in detail again here, but rather will limit our discussion to some problems and pitfalls that are perhaps not commonly recognized and which were not emphasized before.
Monitoring any of these spectroscopic signals as a function of temperature provides a denaturation transition curve (melting curve), which contains thermodynamic information. Figure 6 shows examples of such transition curves transformed in a variety of ways. Figure 6A shows raw absorbance data collected at 295 nm, a wavelength particularly sensitive to disruption of G-quadruplexes (138). Figure 6B shows the same data after transformation and normalization to show the fraction denatured (
) as a function of temperature. The first derivative of the transition curve in panel B is shown in Figure 6C. This is a common approach to directly estimating the Tm of a transition, and also for enhancing the detection of multiple intermediates that differ in their Tm-values. Specific analytical equations are available for extracting thermodynamic parameters from each of these curves are available, as is described in detail in (135,136). Application of these equations yield a thermodynamic profile for the denaturation process that includes the free-energy change (
G), the enthalpy change (
H) and the entropy change (
S). In principle, but rarely in practice, the change in heat capacity [
Cp, cf. Equation (3)] might also be obtained from thermal denaturation curves. There are, however, numerous potential pitfalls in reliably obtaining these thermodynamic parameters.
|
The first pitfall is the difficulty in establishing reliable pre- and post-transition baselines. Any transformation of the primary data or any attempt to directly analyze the primary data by curve fitting must include choices concerning these baselines. As is seen in Figure 5A, these baselines often slope to a significant degree. Such slopes may arise from intrinsic physical phenomenon, such as the intrinsic temperature dependence of fluorescence or from absorbance changes resulting from solvent expansion. More insidiously, though, such slopes could arise from additional reactions that complicate the study of the denaturation transition. As one example, pretransition melting reactions are common (140). These may involve thermally driven processes like helix–helix transition or single-strand base unstacking that precede the actual helix melting transition. Such transitions may have small enthalpy values, leading to broad, featureless melting transitions. Attempts to correct sloping baselines that arise from such complications would lead to an oversimplification of the true reaction mechanism, and to a loss of information. Even in the absence of such complications, baselines present practical problems. There are disturbing reports that document that the lengths of the pre- and post-transition baselines selected and used in data analysis directly affect the values of the thermodynamic parameters extracted from the data (141–144). Investigators of G-quadruplex denaturation should be fully aware of these difficulties, and should describe in detail their procedures for establishing baselines for analysis.
A second pitfall is the common assumption that denaturation reactions are simple two-state processes, and simply pass from a folded native state to an unfolded denatured state without any intermediates. The two-state assumption must be justified by some experimental test. A classical test, first utilized for protein denaturation studies, is to obtain denaturation curves by two (or more) different physical methods (144). If transition curves obtained by the multiple methods are exactly superimposable, that is consistent with a two-state mechanism. More recent tests utilizing multiple wavelength data have appeared. A dual- wavelength parametric test for a two-state denaturation transition monitored by spectroscopy was described (145). In this test, data obtained at two different wavelengths are plotted against one another. For a two-state transition, such a plot should be strictly linear. Deviations from strict linear behavior signal that the denaturation process is not two-state, and likely has intermediate states that are significantly populated. Singular value decomposition (SVD) provides an additional test of the two-state assumption (146,147). With modern diode array spectrophotometers, it is easy to collect entire spectra as a function of temperature, instead of single wavelength data. A set of spectra as a function of temperature defines a 3D surface that is easily converted to a matrix. SVD of the matrix rigorously enumerates the number significant spectral species required to account for the spectral changes. For a two-state transition, there should be only two significant spectral species, corresponding to the folded and unfolded forms. Any number of species greater than two indicates a violation of the two-state assumption, and signals the presence of intermediates. SVD (or a similar multivariate analysis method) has been used to characterize the denaturation of G-quadruplex or other four-stranded structures (148,149). Figure 7 shows examples of whole-spectra UV and CD melting data, and the two-wavelength test of the two-state assumption for the thermal denaturation of the human telomere quadruplex in Na+ solution. For both UV and CD datasets, there are clear deviations from strict linearity, a sure indication that the denaturation reaction is not a simple two-state process, and that intermediate states are populated to a significant degree and must be included in any reaction mechanism. SVD analysis cannot be illustrated in a simple way, but the details of such an analysis are illustrated in refs (59,146,149).
|
As alluded to above, another pitfall is the neglect of heat capacity changes (
Cp) that may accompany quadruplex denaturation. Heat capacity changes are correlated with exposure of hydrophobic surface areas (150,151) as well as increasing fluctuations among microstates associated with the less compact forms (117), so it would be surprising indeed if the unfolding of quadruplex structures, with the concomitant exposures of the bases, was not accompanied by a nonzero
Cp value. Unfortunately, it is enormously difficult to fit reliably transition curves to obtain derivative values of the primary thermodynamic parameters (143). Small heat capacity changes could manifest themselves as contributors to sloping baselines, and might easily be corrected out at the expense of systematic errors in enthalpy values. Even if data such as shown in Figure 6 are further transformed to construct a van't Hoff plot of ln K versus T–1, problems remain. Nonzero
Cp values should lead to curvature in the van't Hoff plot. However, Monte Carlo simulations of van't plots showed that for small (<|200| cal mol–1 deg–1)
Cp values, which is of order observed for NA unfolding (see below), no curvature could in fact be observed within the typical error of experimental data, but that slopes were systematically biased away from true enthalpy values (152). Much larger
Cp values, however, would be expected to become manifest especially by calorimetric methods. There is little that can be done to overcome these pitfalls in the analysis of spectroscopic transitions curves, but these difficulties must be acknowledged. Calorimetry offers another tool that may overcome at least some of the problems.
Calorimetric melting (differential scanning calorimetry)
Differential scanning calorimetry (DSC) (19,20), in which differential heat capacity is measured as a function of temperature, offers a method for measuring the thermodynamics of G-quadruplex denaturation as directly as possible. The advantage of calorimetry is that total denaturation enthalpy values can be measured without recourse to any curve fitting or assumed models. Model-free calorimetric enthalpy values can thus be obtained directly from the primary data. In addition, calorimetric thermograms can also be fit to particular thermodynamic model. Comparison of the model-free calorimetric enthalpies with such calculated model dependent enthalpies provides additional insight into the denaturation process, and in particular provides quantitative information about the cooperativity of the melting or the presence of intermediate states. Haq et al. (153) have provided a practical guide for the use of DSC for the study of the stability of multistranded DNA structures.
DSC studies are also plagued by baseline uncertainties. Processing of DSC data involves two types of baseline corrections. The first is subtraction of independently measured buffer baselines to correct for instrumental variances over the temperature range study. This correction is straightforward and poses no difficulties. The second baseline correction involves choices similar to those discussed above, although in this case it is heat effects that contribute to baseline slopes and nonlinearities. Even though calorimetry represents the gold standard for denaturation studies, it is not entirely without it own uncertainties, and investigators should describe and justify fully the choices that were made in baseline corrections.
Some representative results
Table 2 shows some representative results for denaturation studies of the human telomere quadruplex structure obtained by van't Hoff analysis of spectroscopic data. Data were selected for similar cation concentrations. The results are not comforting. Enthalpy estimates for the denaturation of the Na+ form of the quadruplex range from 38.0 to 72.7 kcal mol–1, nearly a 2-fold difference. For the K+ form, enthalpy values range from 49.0 to 77.5 kcal mol–1. Even worse, the free-energy change at 310 K varies from the marginally stable (0.9 kcal mol–1) to the very stable (7.3 kcal mol–1), which is attributable most likely in the errors in the enthalpy, as Tm values are expected to be rather accurate. These differences are unacceptably large, and the origins of the differences are by no means clear. The sequences used in these studies differed slightly, but it is difficult to believe that nucleotide end effects could exert such an enormous influence. These data point to the need for additional studies to reduce the uncertainty in thermodynamic parameters.
|
A detailed spectroscopic and calorimetric study of the stability of the K+ form of the human telomere quadruplex sequence (TTAGGG)4 was recently reported (157). SVD was used to analyze temperature dependent circular dichroic spectra and to show that quadruplex denaturation was not a simple two-state process. At least three species, corresponding to the folded, unfolded and one intermediate, were required in the reaction mechanism. DSC thermograms clearly showed two transitions. The total calorimetric enthalpy for the overall denaturation process was dependent on KCl concentration, and varied from 32.1 kcal mol–1 at 100 mM to 36.3 kcal mol–1 at 400 mM. Heat capacity changes,
Cp, were not evident in the DSC data, but may have been difficult to detect because of the complexity of the reaction and the difficulties in selecting reliable baselines. This study clearly indicates, at the least, that quadruplex denaturation is more complicated than was assumed in the studies shown in Table 2, and that intermediate states are significantly populated along the denaturation pathway.
Isothermal titration calorimetry (ITC) is most commonly used for binding studies, but a recent novel application used the method to study the enthalpy of G-quadruplex folding (158). In this application, unstructured oligonucleotides were mixed with excess monovalent cation solutions in the calorimeter to monitor the total enthalpy of folding (which includes any contribution from specific ion binding, see below). By repeating the experiment at several temperatures, heat capacity changes could be estimated. The remarkable result was that apparent
Cp values, approaching 1 kcal mol–1 (159) and larger were observed. Such a large heat capacity difference, comparable to that observed in small proteins (151,160), would give rise to a large step in the DSC profile, which is not observed, and also imply cold denaturation at modest temperatures. For example, for a quadruplex that melts with a Tm value of 333 K and an enthalpy change at the temperature of 50 kcal mol–1 (cf. Table 2), then over the normal accessible temperature range from 273 K to 373 K, the enthalpy would change 10 kcal mol–1 for
Cp = 0.1 kcal mol–1 K–1 and 100 kcal mol–1 for
Cp = 1 kcal mol–1 K–1, with a change in sign at 283 K. For the latter case, the temperature of maximum stability would be 286.5 K when
G would be 3.4 kcal mol–1 less stable than if
Cp were zero.
In contrast, for short oligonucleotides,
Cp is of the order 80 cal.mol–1 K (131,161–163) and higher for multistrand structures (117). A large
Cp may be associated with residual structure in the unfolded ensemble (and see below).
Multiple conformations
Numerous structures can form in vitro (63)






