Electronic Dissertations LibraryIdentification of ß-sheet motifs in three-dimensional protein structures, using a subgraph isomorphism algorithm: an update of a 1992 study, by Ruth V. SpriggsIntroductionBackgroundThe University of Sheffield Departments of Information Studies and Molecular Biology and Biotechnology are involved in the investigation of graph-theoretical techniques for representing three-dimensional (3D) protein structures, and the development of algorithms to search databases of these structures. The techniques being used were originally developed to handle two-dimensional (2D) chemical structures, and have been developed to encompass 3D chemical structures and also 3D macromolecules, such as proteins. Techniques other than those using graph theory have also been developed to deal with this research problem and these are discussed in the Literature Review. The 3D structure of proteins is an important area of study as conformation plays an essential role in the wide ranging biological functions that they perform. The techniques such as those used here, and similar ones, are therefore useful for many areas of research. For example, in trying to predict a protein’s structure from its sequence, and in tracing the evolution of a protein structure. The specific unit of structure that this study is interested in is the ß-sheet. This is composed of elongated stretches of the protein chain, called ß-strands, held next to each other by many weak interactions. ß-sheets are a common supersecondary structural feature of globular proteins and a variety of types of ß-sheet, or ß-motif, exist because the ß-strands can be lined up next to each other in a parallel or antiparallel fashion, and their order in the sheet can vary from their order in the protein. The Current StudyThe specific technique being used in this study is a subgraph isomorphism algorithm, used for substructure searching, to identify ß-sheets in the 3D protein structures deposited in the Protein Data Bank (PDB). The algorithm used is a modified version of that due to Ullmann, and is incorporated into the POSSUM program developed at Sheffield. This study looks at the variety of ß-motifs that are made possible through the arrangements of parallel and antiparallel strands. The variety of actual three to 15 stranded ß-sheet motifs found, using POSSUM and a representative subset of the PDB, is compared to the variety of sheets theoretically possible if there were no constraints on motif formation. The primary purpose of this dissertation is to update a PhD study undertaken by Ujah (1992) which detected trends in the frequency of incidence of the various types of ß-motif that were possible. A conclusion from the original study, as reported in Artymiuk et al. (1994a), was that “...the observed distribution of the motifs’ frequencies of occurrence suggest that this will continue to be the case even when very large numbers of three-dimensional protein structures become available” p.62. This current investigation is necessary because the number of protein structures in the PDB has increased greatly since 1992, and will look at whether the same trends are still present now the database contains more wide ranging protein structures, and is, presumably, more representative of the entire protein kingdom. The other areas studied in the PhD, such as investigating the connectivities of the strands in the sheets, are not updated here. Overview of ReportThe five chapters that follow this introduction provide the detail of the research. Proteins introduces the vocabulary used to describe protein structure, outlines how this structure can be determined in the laboratory, and talks briefly about the functions that structure confers. Protein structure prediction and evolution are also discussed, and the current state of research into predicting protein structure from sequence and in tracing evolution are introduced. The Literature Review focuses on the development of techniques for substructure searching in 3D molecules and macromolecules. Techniques other than those in use here are discussed, along with the development of graph theoretical methods. The Methods chapter explains in detail the theory, and the practical use, of all the techniques used in this research to extract data from the Protein Data Bank. This includes the use of BETPAT and POSSUM, and describes the programs written by me to improve the efficiency of the searches. The Analysis of Results details the procedures used to analyse the data, such as allocating the parallel/antiparallel nature to retrieved ß-sheets, and the Chi2 statistical test. Finally, the Discussion and Conclusions chapter brings together the findings from the results analysis and relates these to the trends identified in the 1992 study. This chapter also includes suggestions as to how the research methods used here could be improved, and recommendations for future work in this area. Identification of ß-sheet motifs in three-dimensional protein
structures, using a subgraph isomorphism algorithm: an update of a 1992 study.
MSc in Information Management, 1998/1999 Electronic Dissertations Library © University of Sheffield - Department of Information Studies (All Rights Reserved) |