“School of Biological Sciences”
Back to Papers HomeBack to Papers of School of Biological Sciences
Paper IPM / Biological Sciences / 13164 |
|
||||||||
Abstract: | |||||||||
A central problem in genomics is to determine the functions of newly discovered proteins
using the information contained in their amino acid sequences. In this research we introduce a
novel spatial association on a regular lattice for assignment of a protein sequence to a protein
family. In our model we assume that for each residue in any position in sequence, not only
the adjacent residues, but also the residues of closer homologs contain information. For this
purpose we model the observation with auto correlated errors on a rectangular grid and use the
information of the left, right, top and bottom residues of each amino acid in any position in a
multiple sequence alignment (MSA) of the query sequence with members of each family. The
spatial statistics for analyzing these observations is applied and the classification problem is
solved by computing the probability of query sequence belonging to each protein family. The
classification is based on the family whose MSA yields the highest probability. Using actual
data, the application of spatial prediction for assignment of protein sequence to the protein
profiles is proposed and the performance of the model is assessed. According to the spatial
associations on a regular lattice, we use top ten profiles in the Pfam database that are very
different from each other for analyzing amino acid sequences in a profile. Results show that in
all cases protein sequences are assigned correctly to the corresponding protein profiles.
Download TeX format |
|||||||||
back to top |