Exons Embedded in Intron Sequences


Rony Lim

Oral Defence Date: 



TH 935


Professors Marguerite Murphy, Rahul Singh, & Robert McCaman (CSU Fullerton)


Public availability of the complete sequence for numerous eukaryotic genomes has provided the basis for many types of computational biological investigations of these genomes. The earliest computational investigations were mostly concerned with annotation, gene identification, chromosomal location, and characterization of the portions of genes that encode the messenger RNA (mRNA), that serves as a template for directing syntheses of proteins. Most of the gene transcripts of all eukaryotes from yeast to human are characterized by the presence of non-coding regions (introns) of their DNA sequences that intervene between the protein-coding regions (exons) of the gene-specific DNA sequence. Transcription is a process, taking place within the nucleus of every cell, which involves generation of an RNA copy (transcript) of the DNA that specifies each gene. During the later stages of transcription, the transcribed segments of intervening intronic sequences are removed (spliced out) and the exons are joined (spliced), end to end, to form a linear array of RNA (mRNA). In most cases, the mRNA exits the nucleus and serves as a template for guiding the synthesis of a specific protein, but the exact fate of the ‘spliced out’ intronic sequence has received little attention. During the early days of genomics, when the emphasis was on the protein encoding activities of the genome, the introns were regarded by many as “junk” and thought to be degraded and recycled shortly after being spliced out. Now there is a growing body of evidence that the RNA of spliced out introns may persist for long periods of time, and even more interesting is their proposed role in regulating expression (see Mattick & Makunin for recent review). Thus, introns appear to offer many challenges for further investigation including the content of coding (exonic) sequences embedded in or overlapping with these non-coding (intronic) sequences. Thus a major goal of the present project is to evaluate a computational approach to identify and characterize those intronic sequences which harbor a significant quantity of exonic sub-sequences as a prelude to determining whether such ‘exon-harboring’ introns may have a special role in regulating the expression of other genes. With this approach, we are able to identify some patterns that may or may not have biological significance and hopefully they can be further investigated.


Exon, Intron, Coding sequence, Non-coding sequence, Gene, Overlap, Inverted repeat, Reverse complement.


Rony Lim