Electronic Dissertations Library

Identification of ß-sheet motifs in three-dimensional protein structures, using a subgraph isomorphism algorithm: an update of a 1992 study, by Ruth V. Spriggs

Proteins

    This chapter will provide an introduction to proteins and will describe the various levels of complexity used to explain protein structure, how this structure can be investigated, and the reasons why protein structure exploration is an important area of research.

Protein Structure

    Proteins are composed of 20 basic units called amino acids which consist of a central carbon atom (the alpha-carbon) bound to an amino group (NH2), a carboxyl group (COOH), a hydrogen atom, and one of 20 different R groups. The alpha-carboxyl group of one amino acid is joined to the alpha-amino group of the next by an amide bond (also called a peptide bond) to form chains of amino acid residues (polypeptide chains). Proteins are functional polypeptide chains, and most naturally occurring chains contain between 50 and 2000 amino acid residues (Stryer (1988)). The unbranched chain of amino acid residues has direction, beginning at the amino end, and the chain of regularly repeating peptide bonds is called the backbone, while the R groups projecting from the backbone are known as side chains (Brandon and Tooze (1999)).

    The peptide bonds of the backbone are rigid and planar due to their partial double bond character (Stryer (1988)), however, there is a degree of rotational freedom, and therefore flexibility, around the other bonds of the backbone Figure 1 (Brandon and Tooze (1999)).

    There are an unlimited number of conformations that proteins could adapt, but most fold spontaneously into one particular stable shape. This particular shape occurs because backbone groups and side chains interact with each other, and water, so that particular conformations have more stabilising interactions than others (Alberts et al. (1994)). Insight into this relationship between amino acid sequence and three-dimensional (3D) structure came from studies by Christian Anfinson on a protein called ribonuclease (Stryer (1988)). Anfinson determined that a randomly coiled protein is void of its activity, and that isolated proteins in solution can revert to their original active conformation after denaturing conditions are removed. It was concluded, therefore, that the information needed to refold that protein into its native form must be inherent in the amino acid sequence (Alberts et al. (1994)). Other studies have also drawn the same conclusions, leading to the general theory that a protein’s sequence specifies its conformation (Stryer (1988)).

    Globular proteins fold into a compact globular shape. In contrast, fibrous proteins do not fold into a compact shape as their function lies in their fibrous nature. The current study is interested in the conformations adopted by globular proteins. The hydrophobic nature of certain amino acids is the main driving force in the adoption of these compact structures (Brandon and Tooze (1999)). The side chains of amino acids can be polar or non-polar (Alberts et al. (1994)). Non-polar residues are hydrophobic and pack together in the interior of the globular protein structure to avoid contact with water, whereas polar residues, and the polar groups in the backbone, are hydrophilic and form hydrogen bonds with each other and water (Brandon and Tooze (1999)). These hydrogen bonds form a major part of protein structure stability, and are formed when a hydrogen atom is shared between a hydrogen donor (the group with the tighter link to the hydrogen atom) and a hydrogen acceptor (Alberts et al. (1994)). When hydrogen bonds form between backbone groups in proteins the alpha-amide group is the donor and the alpha-carbonyl group the acceptor.

    The folding of a protein can not be a random search through all possible structures as this would take far longer than the observed time, of approximately one second, for typical proteins to fold from random to their folded state (Robson (1999)). Among the millions of possible folding patterns, proteins take up one working, native, structure. Proteins are thought to initially fold rapidly into a structure in which most of the final secondary structure elements have formed and are aligned in roughly the correct way. This is an open and flexible conformation, called a molten globule, and is the starting point for a relatively slow process in which the side chains are repeatedly adjusted to form the correct tertiary structure (secondary and tertiary structure are explained below). This second stage is thought to have a variety of ‘correct’ pathways to the final conformation (Richards (1991)). This process can be summarised as: local folding, formation of long range interactions, then local rearrangements to give the final most stable folded state (Stryer (1988)). This model has the assumption that, although hydrophobic residues direct the initial folding, they also direct the slower tertiary folding, and allow rearrangements to be made (Richards (1991)).

    The 3D structure of a protein is held in position by three types of non-covalent bond: hydrogen, Van der Waals, and electrostatic. Hydrogen bonds are introduced above, and electrostatic bonds are formed by the attraction between positively and negatively charged groups. Van der Waals bonds are weaker than hydrogen and electrostatic bonds, but are effective in large numbers, and are formed between the hydrocarbon side chains of the amino acid residues (Stryer (1988)). They are formed when a transient charge asymmetry around an atom (possible because the electronic charge distribution around an atom changes with time) induces an opposite asymmetry in an adjacent atom and they are attracted to each other. The attraction is lost when the atoms approach too closely due to the repulsion caused by the overlapping of electrons (Stryer (1988)). Steric hindrance also plays a part in protein structure as sections of the chain fold round each other, for example, a bulky R group can prevent a tight bend being made in the chain. Covalent bonds, in the form of disulphide bonds between two cysteine residues, have a less important role in protein structure stabilisation, and are most often found giving extra stability to proteins that leave the cell environment (Brandon and Tooze (1999)).

    In summary, in aqueous environments protein folding is driven by the tendency of hydrophobic residues to be excluded from water. This is only possible because polar residues and backbone groups can interact with the water at the protein exterior and with themselves in the non-polar environment of the protein interior (Brandon and Tooze (1999)). This conformation is further stabilised by electrostatic and Van der Waals bonds. The conformational folding of proteins destined for non-aqueous environments, such as spanning the cell membrane, differs because the non-polar residues no longer need to pack into a hydrophobic core (Stryer (1988)).

    The forces and interactions described above fold a protein into a particular 3D conformation, and, as Chothia (1984) points out, “[this] apparently complex structure of proteins is in fact governed by a set of relatively simple principles” p.568. These principles are explained by splitting the conformation into various levels which build on each other to produce the entire protein shape (Koch et al. (1992)). These levels are termed primary, secondary, supersecondary, tertiary, and quaternary structure, and are described below. A further level of description is often used to describe protein structure, and that is domains. Domains are compact, often functional, globular units of a protein, separated by more flexible sections of chain (Brandon and Tooze (1999)).

Primary Structure

    As described above, all proteins are made up from 20 basic amino acids. Each type of protein has a unique sequence of these amino acid residues which is determined by genes encoded in the DNA of the cell (Alberts et al. (1994)). This amino acid sequence is the primary structure of a protein (Brandon and Tooze (1999)). Specifically, primary structure is a complete description of the covalent bonds of a protein (Stryer (1988)), and should, therefore, include the location of any disulphide bonds Figure 2.

Secondary Structure

    The secondary structure of a protein is the 3D arrangement of amino acid residues that are relatively near one another in the linear sequence (Stryer (1988)). Secondary structure is created by hydrogen bonding between the alpha-amide groups and alpha-carbonyl groups of the backbone, to enable globular proteins to retain a minimum energy conformation (Chothia (1984)), and common patterns occur in the majority of proteins, despite vastly dissimilar overall structures. Two particularly common patterns are ß-sheets and alpha-helices (Alberts et al. (1994)).

    ß-sheets are formed when sections of the protein chain are stretched out, and stabilised by hydrogen bonding between the backbone groups of adjacent sections (Richards (1991)). Each section of chain, hydrogen bonded into a ß-sheet, is called a ß-strand and because the protein chain has direction, two strands adjacent to each other can be parallel or antiparallel. A ß-sheet therefore has strands which are entirely parallel, entirely antiparallel, or a mixture of both forms, and amino acid side chains which are positioned above and below the plane of the sheet (Stryer (1988)). ß-sheets are also known as ß-pleated sheets because the C alpha atoms of the strands form a zigzag pattern (Brandon and Tooze (1999)).

    The hydrogen bonding pattern in the parallel and antiparallel sheets is distinctively different. The antiparallel ß-sheet has “narrowly spaced hydrogen bond pairs that alternate with widely spaced pairs”, whereas parallel sheets have “evenly spaced hydrogen bonds that bridge the ß-strands at an angle” (Brandon and Tooze (1999:19)) Figure 3. The majority of ß-sheets, however, do not show perfect parallel / antiparallel forms due to a right-handed twist in the sheet (Brandon and Tooze (1999)). To cope with this phenomenon, an angle between strands of less than 90 degrees is taken to indicate that the strands are parallel.

    ß-sheets form structures called ß-barrels in many proteins. These are formed when the sheet is curved to such as extent that the first strand is hydrogen bonded to the last. The number of strands in the sheet, and the stagger of the strands, determines the likelihood of a ß-barrel forming (Murzin et al. (1994 a and b)).

    The other common secondary structure, the alpha-helix, forms when a section of chain turns regularly about itself, in a clockwise direction, to form a cylinder, and is stabilised by hydrogen bonding between the backbone groups of the main chain (Alberts et al. (1994)). The carbonyl group of each amino acid residue is hydrogen bonded to the amine group of the residue four residues ahead in the linear sequence, to give 3.6 amino acid residues per turn of the helix (Stryer (1988)). The tightly coiled backbone of the chain forms the inner part of the helix and the side chains extend out from its surface Figure 4.

    Some writers on the subject of protein structure also classify other features of structure as belonging to the secondary structure class. These features include the ß-turn, and the gamma-turn (Yada et al. (1988)). The ß-turn is the most commonly occurring; allowing an abrupt turn in the chain by the formation of a hydrogen bond between a residue and the fourth residue along from it in the chain (Stryer (1988). ß-turns often connect antiparallel strands in a ß-sheet, and are also called reverse turns or hairpin bends. Many of these turns, where the protein chain changes direction to return into the globular core, contain the active residues used by the protein to perform its function (Alberts et al. (1994)).

Supersecondary Motifs

    Certain combinations of alpha-helices and/or ß-sheets have been recognised as frequently occurring together in clusters. These combinations of secondary structure are known as the supersecondary structure of the protein, and as sheets and helices usually pack in one of a small number of orientations (Chothia (1984)), each particular pattern is called a supersecondary motif (Alberts et al. (1994)) Figure 5. Supersecondary structure is seen as an intermediate level between secondary and tertiary structure (Stryer (1988)) and has a stabilising effect. For example, in aqueous environments a lone alpha-helix is not very stable, but two coiled together to form a coiled-coil (Alberts et al. (1994)), or an alpha-helix in interaction with ß-sheets, has increased stability. When ß-sheets pack together there are also specific patterns seen in their interaction. Chothia and Janin (1981) show that ß-sheets that pack on top of one another orient themselves at an angle of 30 degrees to each other. This is thought to enable side chains to be in close contact with each other, despite the natural right-handed twist of ß-sheets.

    Protein structure is often divided into four classes: ‘all-alpha‘ have only alpha-helical structures, ‘all-ß‘ have only ß-sheet structures, ‘alpha+ß‘ have both alpha-helices and ß-sheets but the two types of structure are in different sections of the linear sequence, and ‘alpha/ß‘ have both structures mixed along the protein chain (Levitt and Chothia (1976)). The arrangement of secondary structures within supersecondary motifs is not random, and features that are adjacent to each other in the sequence have been shown to often be in contact in three-dimensions. These adjacent secondary structure elements that are in contact are often, according to Levitt and Chothia (1976), one of the following forms: alpha alpha (two alpha-helices antiparallel to each other), ßß (two stranded antiparallel ß-sheet), ßßß (three stranded antiparallel ß-sheet), and ßalphaß (two stranded parallel ß-sheet in contact with an alpha-helix antiparallel to it).

    The molten globule theory of protein folding, described above, suggests that supersecondary structures serve as intermediates in the folding process (Stryer (1988)). This is supported by the discovery of particular types of stable supersecondary structures in known structures, and the finding that these supersecondary motifs often form domains in proteins. These domains, often termed folding units, often maintain their stable 3D structures even when separated from the rest of the protein chain (Stryer (1988)).

Tertiary Structure

    Tertiary structure refers to the relative arrangement, in three dimensions, of amino acid residues that are far apart in the linear sequence, although the dividing line between secondary structure and tertiary structure can be unclear (Stryer (1988)). Tertiary structure is often described in terms of the relative positions of supersecondary structures (Stryer (1988)). As supersecondary structure is often used to describe domain structure (Brandon and Tooze (1999)), the tertiary structure of larger proteins can be described as the position of these domains in relation to each other.

Quaternary Structure

    At the highest level of structure, more than one protein chain can associate together to form the active protein (Alberts et al. (1994)). Each chain is called a subunit, and the quaternary structure is the spatial arrangement of these subunits, and the nature of any hydrogen bonding, electrostatic or Van der Waals contacts (Stryer (1988)). The subunits can be identical, for example haemoglobin has four subunits of two types, or different, for example, many types of protein chain interact to form the ribosome structure which coordinates protein synthesis (Stryer (1988)).

How Protein Structure is Determined

    When a protein has been purified from a cell or produced artificially, in large enough quantities, it can be analysed in terms of its sequence, structure, and function. The sequence of a protein can be determined by, for example, a process called the Automated Edman Degradation (Stryer (1988)), while the 3D structure of the folded protein is usually determined using X-ray Crystallography or Nuclear Magnetic Resonance Spectroscopy. Alternative methods include Electron Microscopy, Infrared Spectroscopy, Laser Ramon Spectroscopy, Optical Rotatory Dispersion and Circular Dichroism (Yada et al. (1988); Alberts et al. (1994)).

X-Ray Crystallography

    X-ray crystallography is the most accurate method for determining the 3D structure of a protein (Clark et al. (1991)), but the protein under investigation must be in crystal form, and crystallisation can prove to be a very difficult process (Alberts et al. (1994)). The diffraction pattern seen when a beam of x-rays is directed at a crystal of the protein is used to locate the positions of the atoms in the molecule (Alberts et al. (1994)).

    A beam of x-rays is directed at the crystal: the majority of the beam travels straight through the crystal, but the rest is scattered by the atoms in the sample. Due to the ordered nature of the crystal, the scattered x-rays reinforce each other at certain points and the diffracted x-rays can be detected using x-ray film or an x-ray detector, and their intensities measured. The structure of the protein is then determined by applying a mathematical relation called a Fourier transform to the x-ray diffraction pattern which allows the calculation of a 3D electron density map of the crystal (Alberts et al. (1994)). This map is interpreted, using the known amino acid sequence, to deduce the 3D structure of the protein. The first successful x-ray diffraction study was of myoglobin, by Kendrew et al., in 1958 (Garnier (1990)).

Nuclear Magnetic Resonance Spectroscopy

    Nuclear Magnetic Resonance Spectroscopy (NMR) is increasingly used to study the structure of small proteins or stable domains. A small volume of concentrated protein solution is required, therefore small proteins that resist crystallisation may be studied using this alternative method. The protein solution is placed in a strong magnetic field and the magnetic moment, or spin, of the hydrogen nuclei, of the hydrogen atoms it contains, are aligned with this applied field. The spin of these nuclei can be changed to a misaligned state in response to applied radiofrequency pulses of electromagnetic radiation. When the excited hydrogen nuclei subsequently return to their aligned state, they themselves emit radiofrequency radiation. This radiation is measured and, as the nature of the emitted radiation depends on the environment of each hydrogen nucleus, it is possible to distinguish the radiation from hydrogen nuclei in different amino acid residues and to measure the size of the distance between interacting pairs of hydrogen atoms (Alberts et al. (1994)). This method reveals information about the distance between different parts of the protein molecule and, therefore, with knowledge of the amino acid sequence, it is possible, in theory, to deduce the 3D structure of the protein.


Title Page    Proteins continued


Identification of ß-sheet motifs in three-dimensional protein structures, using a subgraph isomorphism algorithm: an update of a 1992 study.
MSc in Information Management, 1998/1999
Electronic Dissertations Library
© University of Sheffield - Department of Information Studies (All Rights Reserved)