Skip to the content

Identification and functional analysis of novel domains in the human genome


Biological science is now cemented in the era of whole genome biology with numerous projects aimed at determining the complete genetic sequence of whole organisms. At the time of writing, these projects have resulted in the sequencing of over 900 genomes, including numerous viruses and prokaryotes and four eukaryotes (see Saccharomyces cerevisiae (yeast), the nematode worm Caenorhabditis elegans, the fruit fly Drosophila melanogaster and the plant Arabidopsis thaliana (thale cress). Furthermore, the first draft of the human genome has recently been published.

As a result, the amount of genetic data being produced is of a magnitude new in biological science - for instance, the Drosophila genome contains approximately 13,500 genes and the human genome is estimated to contain approximately 30-40,000 genes. The exponential increase of biological data, in both volume and complexity, demands novel and creative methods of analysis. To this end, significant advances have been made in developing bioinformatic tools that facilitate gene prediction, genome assembly, annotation and, in particular, sequence comparison.

Selected references:

J.C. Whisstock and A.M. Lesk. SH3 domains in prokaryotes. Trends Biochem. Sci., 24, 132-3 (1999).

Project details:

In this project, we propose an approach which will extend the functional and structural annotation of large protein databases, such as the putative human proteome and the non-redundant protein databank. The primary aim is to identify, classify, examine and annotate novel protein domains of unknown function in the human and other proteomes. To facilitate this objective we have developed several powerful tools that utilise the VPAC supercomputer. Project run in collaboration with Dr. Maria Garcia de la Banda