Evidence-Derived Proteome-Wide Subcellular Location – A Basis Partitioning for Classification Systems


Kayven Riese

Oral Defence Date: 



HH 301


Professors Margaritte Murphy, Hui Yang & William Hsu


Evidence-Derived data is a paradigm that relates to the product of medical and biological tests of various technologies and disciplines that can be found in a diversity of online repositories in the form of annotations. The Universal Protein Resource (UniProt) is a major such repository that is a heavily cross referenced resource of proteomic data that includes annotation data. Subcellular location (SCL) is an example of data that is found in the form of annotation and is closely related to biological function. SCL is possible in biology because of the basic chemical tenet, “oil and water do not mix.” A major class of protein that is underrepresented in 3D structure bridges SCL and forms the boundary between self and non-self, is the transmembrane proteins (TMPs). This class of protein exploits the chemical diversity of the raw materials available to span the oily center of membranes that form the essential barrier between the inside (self) and outside (not self) of unicellular organisms and the SCL and transcellular boundaries in all multicellular organisms including humans. There is support from multiple groups in the medical literature for two fundamental SCL partitioning schemes that contain four or five compartments, including nuclear, cytoplasmic, extracellular proteins, and TMPs, and optionally mitochondria. A two component C library was constructed that mines fixed format and Gene Ontology (GO) UniProt data to produce Proteomic deMographics (ProMog.c) and produces a simple diagram of the cell with letter height proportionality to report these simple statistics (Cellgram.c). Pertinent data was found for 87.5% of human, and 62.8% of total proteins in the UniProt manually annotated knowledge base. A survey of human proteins that localized to brain tissue showed 34.3% TMP.


UniProt, Proteome, transmembrane, C Programming, Gene Ontology (GO), subcellular location, compartmental model


Kayven Riese